1 Introduction

In this script, we will conduct four sensitivity analyses:

  • SirAE~PA additionally adjusted for ECOG performance status;
  • SirAE~PA restricted to patients who received ipilimumab+nivolumab;
  • SirAE~PA Fine and Gray subdistribution hazard model;
  • OS~PA restricted to patients with melanoma with additional adjustment for lactate dehydrogenase (LDH);
  • OS~PA restricted to patients with irresectable disease;
  • OS~PA restricted to patients with ECOG performance status 0 and 1;
  • OS~PA additionally adjusted for ECOG performance status.

2 Prepare workspace

Because we are preparing the sensitivity analysis, the first part will be the same, but using only the melanoma subgroup and adding the LDH variable.

Empty R environment

rm(list = ls())

Install packages

install.packages("dplyr")
install.packages("tidyr")
install.packages("table1")
install.packages("gtsummary")
install.packages("ggplot2")
install.packages("rms")
install.packages("survival")
install.packages("cmprsk")
install.packages("tidycmprsk")
install.packages("survminer")
install.packages("ggsurvfit")
install.packages("ggpubr")

Activate packages

library(dplyr)
library(tidyr)
library(table1)
library(gtsummary)
library(ggplot2)
library(rms)
library(survival)
library(cmprsk)
library(tidycmprsk)
library(survminer)
library(ggsurvfit)
library(ggpubr)

2.1 Make function to assess linearity assuption of logistic regression

Make function to make binned Residual vs Fitted plot.

residual_lp_binned <- function(data,
                               fit,
                               title = expression(paste("Binned dev. residuals against ",hat(eta))),
                               n_breaks = max(c(10, min(c(floor(nrow(data)/10), 50))))
                               ){
  if(!is.null(fit$na.action)){
    data <- data[-c(fit$na.action),]
  }
  data <- dplyr::mutate(data, residuals=residuals(fit, type = "pearson"),linpred=predict(fit, type = "link"))
  gdf <- dplyr::group_by(data, cut(linpred, breaks=unique(quantile(linpred, (1:(n_breaks))/(n_breaks+1)))))
  diagdf <- dplyr::summarise(gdf, residuals=mean(residuals),linpred=mean(linpred))
  data$residuals <- data$linpred <- NULL
  ggplot2::ggplot(diagdf, aes(y=residuals, x=linpred)
                  )+ ggplot2::geom_hline(yintercept = 0
                                         ,color = "darkgrey"
                  )+ ggplot2::geom_point(
                  )+ ggplot2::labs(x=expression(hat(eta))
                                   ,y="pearson residuals (binned)"
                                   ,title=title
                                   ,caption = deparse1(fit$formula)
                  )+ ggplot2::theme_bw(
                  )
}

2.2 Make functions for baseline tables

We use the table1 package to make a table of patient characteristics. See the vignette of table1 for more details.

The functions below define how data are displayed within the table.

    #function to exclude missings form percentages
my.render.cat <- function(x) {
  c("", sapply(stats.default(x), function(y) with(y,
                                                  if(PCTnoNA%in%c("NaN")){sprintf("")} #if all categories within one variable of a stratum are missing, no output
                                                  else{sprintf("%d (%0.1f%%)", FREQ, PCTnoNA)} #otherwise percentages excluding missings
                                                  )))
}

  #function to show percentage of missings if applicable
my.render.missing <- function(x, ...) {
  with(stats.apply.rounding(stats.default(is.na(x), ...), ...)$Yes,
       if(PCT==100){c(missing = sprintf(""))} #if variable is missing for all subjects within this stratum, don't show missings
       else( c(missing=sprintf("%s (%s%%)", FREQ, PCT))) #otherwise, show percentages of missings
      )
}
  #function to show continuous variables as median [Q1, Q3]
my.render.cont <- function(x, name, ...) {
  with(stats.default(x),
       c("", "median [Q1-Q3]"=sprintf("%s [%s-%s]"
                                      , round_pad(MEDIAN, digits=1)
                                      , round_pad(Q1, digits=1)
                                      , round_pad(Q3, digits=1)
                                      )
         ,"mean (SD)"=sprintf("%s (%s)"
                              , round_pad(MEAN, digits=1)
                              , round_pad(SD, digits=1))
         ))
}

Define working directory and load files

2.3 Load and prepare clinical data

Load the data after preprocessing to obtain physical activity variables from SQUASH questionnaire and clinical variables.

We have two data frames:

  • data_OS: data from all SQUASH respondents with survival data. We will use this for assessing the correlation of physical activity with survival, however melanoma patients only.
  • data_tox: data from all SQUASH respondents with at least one year follow-up. We will use this for assessing the correlation of physical activity with severe irAEs, but only using the melanoma patients.

We will make some additional data frames for the subgroup analyses.

  • OS_melanoma: for the sensitivity analysis of OS restricted to melanoma patients, we first use the subset melanoma patients and include the variable LDH in our analyses.
  • OS_palliative: for the sensitivity analysis of OS restricted to irresectable patients.
  • OS_PS: for the sensitivity analysis of OS restricted to patient with ECOG performance status 0 or 1 (so excluding >1 and unknown).
  • tox_cICI: for the sensitivity analysis of SirAEs restricted to patients who received ipilimumab+nivolumab.
  • Note that we will use data_OS for the Fine and Gray subdistribution hazards model, since this also includes patients who started ICI less than one year before last follow-up.

We double check if we have missing values and if we can incorporate them or we need to exclude them from the sensitivity analysis.

data_OS$OS_status <- as.numeric(as.character(recode(data_OS$OS_status, "no"="0", "yes"="1")))

Make variable with ECOG PS 0, 1 or >1. We will use this variable in sensitivity analyses.

data_OS$B_PS_tri <- factor(ifelse(data_OS$B_PS%in%"0","0",
                           ifelse(data_OS$B_PS%in%"1","1",
                                  ifelse(data_OS$B_PS%in%c("2","3","4"),"2-3",
                                         NA))))
table(data_OS$B_PS_tri, data_OS$B_PS, useNA="always")
##       
##          0   1   2   3   4 unknown <NA>
##   0    126   0   0   0   0       0    0
##   1      0 101   0   0   0       0    0
##   2-3    0   0  16   2   0       0    0
##   <NA>   0   0   0   0   0       6    0
data_tox$B_PS_tri <- factor(ifelse(data_tox$B_PS%in%"0","0",
                           ifelse(data_tox$B_PS%in%"1","1",
                                  ifelse(data_tox$B_PS%in%c("2","3","4"),"2-3",
                                         NA))))
table(data_tox$B_PS_tri, data_tox$B_PS, useNA="always")
##       
##          0   1   2   3   4 unknown <NA>
##   0    103   0   0   0   0       0    0
##   1      0  86   0   0   0       0    0
##   2-3    0   0  13   1   0       0    0
##   <NA>   0   0   0   0   0       6    0

2.4 Specify location of knots

Restricted cubic splines work with a certain number of knots: a hinge point for the distribution which gets smoothed out by a cubic parameter. We can specify the number and locations of knots or we can choose to let the knot locations be chosen by our software. Usually, 3 to 5 knots suffices. Consider that each knot ‘costs’ us a degree of freedom and that too many knots may likely result in overfitting. Therefore, we will use 3 knots here. We make use of the rcspline.eval() function in the Hmisc package.

We will use the same knot locations in all models and sensitivity analyses based on data of all included patients.

Specify knots for MET-hours.

k.pos.totmet_hpwk <- rcspline.eval(data_OS$totmet_hpwk #variable to use splines for
                            , nk=3 #number of knots
                            , inclx=FALSE
                            , knots.only=T #only store knots
                            , type="ordinary"
                            , norm=2
                            , rpm=NULL
                            , pc=FALSE
                            , fractied=0.05)

Specify knots for MVPA-SL.

k.pos.MZvt_HWK <- rcspline.eval(data_OS$MZvt_HWK #variable to use splines for
                            , nk=3 #number of knots
                            , inclx=FALSE
                            , knots.only=T #only store knots
                            , type="ordinary"
                            , norm=2
                            , rpm=NULL
                            , pc=FALSE
                            , fractied=0.05)

3 SirAE~PA - ECOG adjusted

Here, we will rerun the same analyses as primary analyses, but now with additional adjustment for ECOG performance status. ECOG performance status was missing for 6 patients (2.4%). Electronic patient files were checked, data seemed missing completely at random (MCAR). Given the low number of missing values, missingness under MCAR assumption and the analysis being a sensitivity analysis, a complete case analysis was considered appropriate.

3.1 MET-hours and severe irAEs

To get some feeling, we calculate the proportion of patients with severe irAEs among each tertile of total PA.

tbl_summary(data_tox[, c("totmet_hpwk_tertiles", "T1_gr3_1y")]
            , by="totmet_hpwk_tertiles"
            , missing= "always"
            , percent = "column")
Characteristic [0,51], N = 731 (51,101], N = 691 (101,371], N = 671
T1_gr3_1y 21 (29%) 10 (14%) 7 (10%)
    Unknown 0 0 0
1 n (%)

3.1.1 SirAEs~MET-hours logistic regression - tertiles

Adjusted logistic regression for tertiles variable.

glm_tert.adj <- glm(formula = T1_gr3_1y ~ totmet_hpwk_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, family = binomial(link = "logit"), data = data_tox)
glm_tert.adj_PS <- glm(formula = T1_gr3_1y ~ totmet_hpwk_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev +  B_ther_cur_regrouped.1 + B_PS_tri, family = binomial(link = "logit"), data = data_tox)
tbl_merge(list(
  tbl_regression(glm_tert.adj, exponentiate=T),
  tbl_regression(glm_tert.adj_PS, exponentiate=T)
  ),
  tab_spanner = c("without ECOG PS", "with ECOG PS")
)
Characteristic without ECOG PS with ECOG PS
OR1 95% CI1 p-value OR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.34 0.12, 0.90 0.035 0.32 0.10, 0.88 0.033
    (101,371] 0.19 0.05, 0.55 0.004 0.20 0.06, 0.60 0.006
B_sex
    male
    female 1.45 0.60, 3.48 0.4 1.61 0.64, 4.05 0.3
B_age 1.04 1.00, 1.08 0.062 1.04 1.00, 1.08 0.081
B_tumor_type.1
    melanoma
    NSCLC 1.28 0.16, 9.48 0.8 1.64 0.19, 13.0 0.6
    RCC 0.36 0.09, 1.27 0.12 0.34 0.08, 1.27 0.12
    other 0.85 0.14, 4.70 0.9 1.15 0.18, 7.16 0.9
B_paladj
    palliative
    adjuvant 0.77 0.19, 3.35 0.7 1.09 0.23, 5.69 >0.9
B_ther_prev
    no
    yes 0.24 0.03, 1.11 0.10 0.23 0.03, 1.12 0.10
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 14.5 3.43, 72.7 <0.001 19.8 4.29, 115 <0.001
    ICI + chemo/targeted therapy 1.17 0.15, 8.55 0.9 1.26 0.15, 9.63 0.8
B_PS_tri
    0
    1 0.98 0.36, 2.65 >0.9
    2-3 1.04 0.16, 6.32 >0.9
1 OR = Odds Ratio, CI = Confidence Interval

3.1.2 SirAEs~MET-hours logistic regression - continuous linear

Adjusted logistic regression for continuous linear variable.

glm_lin.adj <- glm(formula = T1_gr3_1y ~ totmet_hpwk + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, family = binomial(link = "logit"), data = data_tox)
# tbl_regression(glm_lin.adj, exponentiate=T)

3.1.3 SirAEs~MET-hours logistic regression - assumptions

Assess linearity of totmet_hpwk and B_age with outcome.

fit <- glm(formula = T1_gr3_1y ~ totmet_hpwk , family = binomial(link = "logit"), data = data_tox)
residual_lp_binned(data_tox, fit = fit, title = "totmet_hpwk")

fit <- glm(formula = T1_gr3_1y ~ B_age , family = binomial(link = "logit"), data = data_tox)
residual_lp_binned(data_tox, fit = fit, title = "age")

Assess whether residuals are normally distributed.

plot(glm_lin.adj, which = 2)

Check homoscedasticity.

residual_lp_binned(data_tox, glm_lin.adj)

residual_lp_binned(data_tox, glm_tert.adj)

3.1.4 SirAEs~MET-hours logistic regression - continuous splines

3.1.4.1 Initial regression

Conduct survival analysis using splines regression by the prespecified knots. We make use of the rcs() function in package rms which provides restricted cubic splines.

glm_rcs.adj <- glm(formula = T1_gr3_1y ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, family = binomial(link = "logit"), data = data_tox)
tbl_regression(glm_rcs.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
totmet_hpwk
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk 0.97 0.95, 0.99 0.005
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk' 1.03 1.00, 1.05 0.023
B_sex
    male
    female 1.52 0.63, 3.63 0.3
B_age 1.03 1.00, 1.07 0.088
B_tumor_type.1
    melanoma
    NSCLC 1.15 0.15, 8.15 0.9
    RCC 0.36 0.09, 1.27 0.12
    other 0.76 0.12, 4.25 0.8
B_paladj
    palliative
    adjuvant 0.70 0.17, 3.03 0.6
B_ther_prev
    no
    yes 0.28 0.04, 1.27 0.14
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 11.5 2.79, 55.6 0.001
    ICI + chemo/targeted therapy 1.17 0.15, 8.10 0.9
1 OR = Odds Ratio, CI = Confidence Interval
3.1.4.1.1 Check added value of using splines

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether totmet_hpwk improves the model fit at all (compare a model with splines for totmet_hpwk (Model 1) to a model without totmet_hpwk (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + 
##     B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
## Model 2: T1_gr3_1y ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + 
##     B_ther_cur_regrouped.1
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1       197     148.79                       
## 2       199     157.69 -2   -8.905  0.01165 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for totmet_hpwk gives a better fit than adding it as a linear variable (compare a model with splines for totmet_hpwk (Model 1) to a model with totmet_hpwk as is (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)+totmet_hpwk), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + 
##     B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
## Model 2: T1_gr3_1y ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + 
##     B_ther_cur_regrouped.1 + totmet_hpwk
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1       197     148.79                       
## 2       198     154.24 -1  -5.4503  0.01956 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.4.2 Visualise the regression coefficient

We want to visualise the Odds ratio (OR) for developing a severe irAE within 1 year over the trajectory of our variable of interest (totmet_hpwk). To do this, we will make a dummy dataset to predict the OR for certain values of totmet_hpwk. We will also obtain the 95% confidence interval (CI).

3.1.4.2.1 Make dummy dataset to obtain predicted OR
  • Make a new data frame expanding on existing data
  • Make a sequence of 300 equally distributed numbers between the minimum and maximum value of totmet_hpwk.
    • could also be more or less than 300, depending on the spread of totmet_hpwk.
  • Set some reference levels for the other variables of interest.
    • The choice of the reference category won’t impact the results.
if(!is.null(glm_rcs.adj$na.action)){
  newdf <- data_tox[-c(glm_rcs.adj$na.action),]
} else {
  newdf <- data_tox
}

newdf$totmet_hpwk <- c(seq(min(data_tox$totmet_hpwk),max(data_tox$totmet_hpwk),length=(nrow(newdf)-length(k.pos.totmet_hpwk))), k.pos.totmet_hpwk)

Now we make a model matrix, which is needed in the subsequent steps.

mm <- model.matrix(glm_rcs.adj, data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("totmet_hpwk",colnames(mm))]] <- 0 # set all variables not about totmet_hpwk to 0

For each value of our sequence of totmet_hpwk, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(totmet_hpwk=newdf$totmet_hpwk,eta=mm%*%coef(glm_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(glm_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain OR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for totmet_hpwk) and sort on totmet_hpwk.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$totmet_hpwk),] #sort based on var1 order
3.1.4.2.2 Make splines plot

Now, we can finally make the actual plot.

  • We use geom_ribbon() to obtain the 95%CI.
  • We use geom_hline() to draw a line at the reference value 1.
  • We use geom_line() to draw the line for the HR.
  • We use geom_point() to draw the location of the splines.
  • We use geom_rug() to visualise the instances of var1 on the x-axis.
  • We use scale_y_log10() to transform the y-axis by a 10 base logarithm and specify the breaks.
p_splines_tox.OR_MET <- ggplot(data = est
                )+ geom_ribbon(aes(x=totmet_hpwk, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                , aes(x=totmet_hpwk, y=eta)
                )+ geom_rug(data = data_tox, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.01,seq(0,1,.2)/10,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.01,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("total PA (MET-hours/week)" #add x-axis label
                )+ ylab("Odds ratio for severe irAE" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_tox.OR_MET

3.1.5 Visualize predicted probabilities of ficticious patient

Alternatively, we can visualize the predicted probabilities given a patient in whom only the variable of interest changes.

newdf <- with(data_tox,
              expand.grid(totmet_hpwk=c(seq(min(totmet_hpwk),max(totmet_hpwk),length=300), k.pos.totmet_hpwk)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = round(median(B_age), digits = 0)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          , B_PS_tri = levels(B_PS_tri)[1]
                          ))
newdf$prob <- predict(glm_rcs.adj, newdata = newdf, type = "response")
p_splines_tox.prob <- ggplot(data = newdf
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_hline(yintercept=0 #add horizontal line at 0
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=prob) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=newdf[newdf$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                , aes(x=totmet_hpwk, y=prob)
                )+ geom_rug(data = data_tox, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.0,1.0) #set y-axis
                )+ labs(x = "total PA (MET-hours/week)" #add x-axis label
                        ,y ="Probability of  severe irAE" #add y-axis label
                        ,title = "probability of SirAE for a fictitious patient with differing total PA"
                        ,caption = "64 year old male with ECOG performance status 0 and melanoma treated with first line anti-PD-(L)1 in palliative setting"
                )+ theme_bw( #add/alter theme
                )

p_splines_tox.prob

3.2 MVPA-SL and severe irAEs - ECOG adjusted

3.2.1 SirAEs~MVPA-SL logistic regression - tertiles

Adjusted logistic regression for tertiles variable.

glm_tert.adj <- glm(formula = T1_gr3_1y ~ MZvt_HWK_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, family = binomial(link = "logit"), data = data_tox)
glm_tert.adj_PS <- glm(formula = T1_gr3_1y ~ MZvt_HWK_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, family = binomial(link = "logit"), data = data_tox)
tbl_merge(list(
  tbl_regression(glm_tert.adj, exponentiate=T),
  tbl_regression(glm_tert.adj_PS, exponentiate=T)
  ),
  tab_spanner = c("without ECOG PS", "with ECOG PS")
)
Characteristic without ECOG PS with ECOG PS
OR1 95% CI1 p-value OR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.42 0.16, 1.06 0.073 0.40 0.14, 1.03 0.064
    (6.5,38.5] 0.11 0.03, 0.38 0.001 0.12 0.03, 0.42 0.002
B_sex
    male
    female 1.11 0.45, 2.68 0.8 1.29 0.51, 3.25 0.6
B_age 1.05 1.01, 1.10 0.009 1.05 1.01, 1.10 0.017
B_tumor_type.1
    melanoma
    NSCLC 1.08 0.12, 8.28 >0.9 1.40 0.15, 11.5 0.8
    RCC 0.30 0.07, 1.09 0.074 0.28 0.07, 1.07 0.070
    other 0.80 0.13, 4.59 0.8 1.07 0.16, 6.88 >0.9
B_paladj
    palliative
    adjuvant 0.85 0.20, 3.83 0.8 1.25 0.26, 6.96 0.8
B_ther_prev
    no
    yes 0.30 0.04, 1.42 0.2 0.29 0.04, 1.42 0.2
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 15.3 3.53, 80.4 <0.001 21.0 4.42, 129 <0.001
    ICI + chemo/targeted therapy 1.45 0.18, 11.3 0.7 1.60 0.18, 13.0 0.7
B_PS_tri
    0
    1 0.96 0.35, 2.61 >0.9
    2-3 1.20 0.19, 7.09 0.8
1 OR = Odds Ratio, CI = Confidence Interval

3.2.2 SirAEs~MVPA-SL logistic regression - continuous linear

Adjusted logistic regression for continuous linear variable.

glm_lin.adj <- glm(formula = T1_gr3_1y ~ MZvt_HWK + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, family = binomial(link = "logit"), data = data_tox)
# tbl_regression(glm_lin.adj, exponentiate=T)

3.2.3 SirAEs~MVPA-SL logistic regression - assumptions

Check assumptions:

  • Observations are independent:
    • is the case by design.
  • Relationships of explanatory variables with outcome should be linear:
    • Residual vs Fitted(=predictor) plot per (numeric) covariate.
    • Should show a horizontal band without curvature/pattern.
    • Since our outcome is binary, we need to bin the residuals.
  • Residuals should be normally distributed:
    • QQ-plot: points should lie on one diagonal line.
  • Homoscedasticity=homogeneity of variances:
    • The spread of the residuals does not depend on the linear predictor.
    • Residual vs Fitted(=linear predictor) plot.
    • Should show a horizontal band without curvature/pattern.

Assess linearity of MZvt_HWK and B_age with outcome.

fit <- glm(formula = T1_gr3_1y ~ MZvt_HWK , family = binomial(link = "logit"), data = data_tox)
residual_lp_binned(data_tox, fit = fit, title = "MZvt_HWK")

fit <- glm(formula = T1_gr3_1y ~ B_age , family = binomial(link = "logit"), data = data_tox)
residual_lp_binned(data_tox, fit = fit, title = "age")

Assess whether residuals are normally distributed.

plot(glm_lin.adj, which = 2)

Check homoscedasticity.

residual_lp_binned(data_tox, glm_lin.adj)

residual_lp_binned(data_tox, glm_tert.adj)

3.2.4 SirAEs~MVPA-SL logistic regression - continuous splines

3.2.4.1 Initial regression

Conduct survival analysis using splines regression by the prespecified knots. We make use of the rcs() function in package rms which provides restricted cubic splines.

glm_rcs.adj <- glm(formula = T1_gr3_1y ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, family = binomial(link = "logit"), data = data_tox)
tbl_regression(glm_rcs.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
MZvt_HWK
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK 0.80 0.62, 1.02 0.073
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK' 1.21 0.72, 1.93 0.4
B_sex
    male
    female 1.23 0.49, 3.08 0.7
B_age 1.05 1.01, 1.10 0.019
B_tumor_type.1
    melanoma
    NSCLC 1.35 0.15, 11.0 0.8
    RCC 0.27 0.06, 1.02 0.060
    other 1.04 0.16, 6.59 >0.9
B_paladj
    palliative
    adjuvant 1.16 0.24, 6.37 0.9
B_ther_prev
    no
    yes 0.27 0.03, 1.33 0.15
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 19.0 4.06, 114 <0.001
    ICI + chemo/targeted therapy 1.47 0.17, 11.8 0.7
B_PS_tri
    0
    1 0.96 0.35, 2.61 >0.9
    2-3 1.11 0.17, 6.65 >0.9
1 OR = Odds Ratio, CI = Confidence Interval
3.2.4.1.1 Check added value of using splines

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether MZvt_HWK improves the model fit at all (compare a model with splines for MZvt_HWK (Model 1) to a model without MZvt_HWK (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + 
##     B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + 
##     B_PS_tri
## Model 2: T1_gr3_1y ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + 
##     B_ther_cur_regrouped.1 + B_PS_tri
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1       189     139.07                        
## 2       191     149.91 -2   -10.84 0.004427 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for MZvt_HWK gives a better fit than adding it as a linear variable (compare a model with splines for MZvt_HWK (Model 1) to a model with MZvt_HWK as is (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)+MZvt_HWK), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + 
##     B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + 
##     B_PS_tri
## Model 2: T1_gr3_1y ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + 
##     B_ther_cur_regrouped.1 + B_PS_tri + MZvt_HWK
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1       189     139.07                     
## 2       190     139.62 -1 -0.54822    0.459

3.2.4.2 Visualise the regression coefficient

3.2.4.2.1 Make dummy dataset to obtain predicted OR
if(!is.null(glm_rcs.adj$na.action)){
    newdf <- data_tox[-c(glm_rcs.adj$na.action),]
} else {
  newdf <- data_tox
}

newdf$MZvt_HWK <- c(seq(min(data_tox$MZvt_HWK),max(data_tox$MZvt_HWK),length=(nrow(newdf)-length(k.pos.MZvt_HWK))), k.pos.MZvt_HWK)

Now we make a model matrix, which is needed in the subsequent steps.

mm <- model.matrix(glm_rcs.adj, data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("MZvt_HWK",colnames(mm))]] <- 0 # set all variables not about MZvt_HWK to 0

For each value of our sequence of MZvt_HWK, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(MZvt_HWK=newdf$MZvt_HWK,eta=mm%*%coef(glm_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(glm_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain OR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for MZvt_HWK) and sort on MZvt_HWK.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$MZvt_HWK),] #sort based on var1 order
3.2.4.2.2 Make splines plot
p_splines_tox.OR_MVPASL <- ggplot(data = est
                )+ geom_ribbon(aes(x=MZvt_HWK, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                , aes(x=MZvt_HWK, y=eta)
                )+ geom_rug(data = data_tox, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.01,seq(0,1,.2)/10,seq(0,1,.2), seq(0,1,.2)*10))
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.01,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("MVPA-SL (hours/week)" #add x-axis label
                )+ ylab("Odds ratio for severe irAE" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_tox.OR_MVPASL

3.2.5 Visualize predicted probabilities of ficticious patient

Alternatively, we can visualize the predicted probabilities given a patient in whom only the variable of interest changes.

newdf <- with(data_tox,
              expand.grid(MZvt_HWK=c(seq(min(MZvt_HWK),max(MZvt_HWK),length=300), k.pos.MZvt_HWK)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = round(median(B_age), digits = 0)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          , B_PS_tri = levels(B_PS_tri)[1]
                          ))
newdf$prob <- predict(glm_rcs.adj, newdata = newdf, type = "response")
p_splines_tox.prob <- ggplot(data = newdf
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_hline(yintercept=0 #add horizontal line at 0
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=prob) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=newdf[newdf$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                , aes(x=MZvt_HWK, y=prob)
                )+ geom_rug(data = data_tox, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.0,1.0) #set y-axis
                )+ labs(x = "MVPA-SL (hours/week)" #add x-axis label
                        ,y ="Probability of  severe irAE" #add y-axis label
                        ,title = "probability of SirAE for a fictitious patient with differing MVPA-SL"
                        ,caption = "64 year old male with ECOG performance status 0 and melanoma treated with first line anti-PD-(L)1 in palliative setting"
                )+ theme_bw( #add/alter theme
                )

p_splines_tox.prob

3.3 Combine plots

Make Supplementary Figure 1.

suppl_figure1 <- ggarrange(p_splines_tox.OR_MET, p_splines_tox.OR_MVPASL
                     ,labels = c("A","B")
                     ,ncol=2, nrow=1
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure1

Save as PDF.


4 SirAE~PA - ipilimumab+nivolumab subgroup

Make tox_cICI.

tox_cICI <- data_tox[data_tox$B_ther_cur_regrouped.1%in%"cICI",]
tox_cICI$B_tumor_type.1 <- droplevels(tox_cICI$B_tumor_type.1)
table(tox_cICI$B_tumor_type.1, useNA = "always")
## 
## melanoma      RCC    other     <NA> 
##       25       23        1        0

Notably, only one patient in our cohort developed his/her first severe irAE after one year. This patient is thus regarded as without severe irAE <1 year.

table(tox_cICI$T1_gr3_1y, useNA = "always")
## 
##   no  yes <NA> 
##   27   22    0

4.1 Baseline characteristics

Specify data frame to use

data <- tox_cICI

Relabel variables and add units that you want to use in your table (all formatting to let the table look nicely):

label(data$B_sex) <- "sex"
label(data$B_age) <- "age"
units(data$B_age) <- "years"
label(data$B_PS) <- "ECOG performance status"
label(data$B_tumor_type.1) <- "tumor type"
label(data$B_tumor_stage_dich) <- "stage"
label(data$B_paladj) <- "treatment intent"
label(data$B_ther_prev) <- "previous systemic therapy"
label(data$B_ther_cur_regrouped.1) <- "therapy"
label(data$T1_gr3_1y) <- "severe irAE within 1 year"
label(data$totmet_hpwk_tertiles) <- "total PA"
label(data$totmet_hpwk) <- "total PA"
units(data$totmet_hpwk) <- "hours/week"
label(data$MZvt_HWK) <- "MVPA-SL"
units(data$MZvt_HWK) <- "hours/week"

Make the actual table:

  • ~ B_sex + B_age + ... + B_ther_cur_regrouped.1 defines the rows;
  • | totmet_hpwk_tertiles defines the columns;
  • the rest are just to change the appearance of the table.
table1(~ totmet_hpwk + MZvt_HWK + B_sex + B_age + B_PS + B_tumor_type.1 + B_tumor_stage_dich + B_paladj + B_ther_prev + B_ther_cur_regrouped.1| totmet_hpwk_tertiles, 
       data = data, 
       overall = "Overall", 
       # caption = "Table 1: baseline characteristics of combined ipilimumab",
       # footnote = "RCC: renal cell carcinoma",
       topclass = "Rtable1-zebra",
       render.categorical = my.render.cat,
       render.continuous = my.render.cont,#c(.="median [Q1, Q3]"),
       render.missing = my.render.missing,
       droplevels = TRUE)
[0,51]
(N=16)
(51,101]
(N=16)
(101,371]
(N=17)
Overall
(N=49)
total PA (hours/week)
median [Q1-Q3] 24.6 [8.3-35.5] 76.1 [64.5-88.3] 126.3 [119.4-178.7] 77.8 [35.6-119.4]
mean (SD) 22.7 (16.1) 76.1 (15.6) 156.0 (67.7) 86.4 (69.2)
MVPA-SL (hours/week)
median [Q1-Q3] 0.0 [0.0-1.9] 2.2 [0.5-5.3] 7.0 [4.5-15.0] 3.0 [0.0-7.0]
mean (SD) 1.0 (1.7) 3.7 (4.2) 10.2 (7.3) 5.1 (6.3)
sex
male 11 (68.8%) 11 (68.8%) 12 (70.6%) 34 (69.4%)
female 5 (31.2%) 5 (31.2%) 5 (29.4%) 15 (30.6%)
age (years)
median [Q1-Q3] 65.5 [56.8-72.3] 62.5 [53.0-74.3] 60.0 [57.0-67.0] 62.0 [55.0-72.0]
mean (SD) 61.4 (15.7) 61.1 (13.9) 62.4 (8.9) 61.6 (12.8)
ECOG performance status
0 8 (50.0%) 5 (31.2%) 9 (52.9%) 22 (44.9%)
1 4 (25.0%) 11 (68.8%) 6 (35.3%) 21 (42.9%)
2 3 (18.8%) 0 (0.0%) 1 (5.9%) 4 (8.2%)
3 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
4 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
unknown 1 (6.2%) 0 (0.0%) 1 (5.9%) 2 (4.1%)
tumor type
melanoma 8 (50.0%) 8 (50.0%) 9 (52.9%) 25 (51.0%)
RCC 7 (43.8%) 8 (50.0%) 8 (47.1%) 23 (46.9%)
other 1 (6.2%) 0 (0.0%) 0 (0.0%) 1 (2.0%)
stage
III 0 (0.0%) 0 (0.0%) 1 (5.9%) 1 (2.0%)
IV 16 (100.0%) 16 (100.0%) 16 (94.1%) 48 (98.0%)
other 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
treatment intent
palliative 16 (100.0%) 16 (100.0%) 17 (100.0%) 49 (100.0%)
adjuvant 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
previous systemic therapy
no 15 (93.8%) 15 (93.8%) 16 (94.1%) 46 (93.9%)
yes 1 (6.2%) 1 (6.2%) 1 (5.9%) 3 (6.1%)
therapy
anti-PD-(L)1 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
cICI 16 (100.0%) 16 (100.0%) 17 (100.0%) 49 (100.0%)
ICI + chemo/targeted therapy 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)

Indeed we selected only patients who received cICI

Check which type of tumor the “other” is.

tox_cICI[tox_cICI$B_tumor_type.1%in%"other",c("record_id","B_tumor_type","B_tumor_type_other")]

4.2 SirAE~MET ipilimumab+nivolumab subgroup

4.2.1 SirAE~MET ipilimumab+nivolumab subgroup logistic regression - tertiles

Adjusted logistic regression for tertiles variable.

glm_tert.adj <- glm(formula = T1_gr3_1y ~ totmet_hpwk_tertiles + B_sex + B_age,
                    family = binomial(link = "logit"),
                    data = tox_cICI)
tbl_regression(glm_tert.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.46 0.10, 1.89 0.3
    (101,371] 0.54 0.13, 2.14 0.4
B_sex
    male
    female 1.67 0.48, 5.97 0.4
B_age 1.01 0.97, 1.06 0.6
1 OR = Odds Ratio, CI = Confidence Interval

4.2.2 SirAE~MET ipilimumab+nivolumab subgroup logistic regression - continuous linear

Adjusted logistic regression for continuous linear variable.

glm_lin.adj <- glm(formula = T1_gr3_1y ~ totmet_hpwk + B_sex + B_age, 
                   family = binomial(link = "logit"), 
                   data = tox_cICI)
tbl_regression(glm_lin.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
totmet_hpwk 1.00 0.99, 1.01 >0.9
B_sex
    male
    female 1.64 0.48, 5.81 0.4
B_age 1.01 0.97, 1.06 0.6
1 OR = Odds Ratio, CI = Confidence Interval

4.2.3 SirAE~MET ipilimumab+nivolumab subgroup logistic regression - assumptions

Check assumptions: - Observations are independent: - is the case by design. - Relationships of explanatory variables with outcome should be linear: - Residual vs Fitted(=predictor) plot per (numeric) covariate. - Should show a horizontal band without curvature/pattern. - Since our outcome is binary, we need to bin the residuals. - Residuals should be normally distributed: - QQ-plot: points should lie on one diagonal line. - Homoscedasticity=homogeneity of variances: - The spread of the residuals does not depend on the linear predictor. - Residual vs Fitted(=linear predictor) plot. - Should show a horizontal band without curvature/pattern.

Assess linearity of totmet_hpwk and B_age with outcome.

fit <- glm(formula = T1_gr3_1y ~ totmet_hpwk , family = binomial(link = "logit"), data = tox_cICI)
residual_lp_binned(tox_cICI, fit = fit, title = "totmet_hpwk")

fit <- glm(formula = T1_gr3_1y ~ B_age , family = binomial(link = "logit"), data = tox_cICI)
residual_lp_binned(tox_cICI, fit = fit, title = "age")

Assess whether residuals are normally distributed.

plot(glm_lin.adj, which = 2)

Check homoscedasticity.

residual_lp_binned(tox_cICI, glm_lin.adj)

residual_lp_binned(tox_cICI, glm_tert.adj)

4.2.4 SirAE~MET ipilimumab+nivolumab subgroup logistic regression - continuous splines

Initial regression.

glm_rcs.adj <- glm(formula = T1_gr3_1y ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age, 
                   family = binomial(link = "logit"),
                   data = tox_cICI)
tbl_regression(glm_rcs.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
totmet_hpwk
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk 0.98 0.96, 1.01 0.2
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk' 1.02 1.00, 1.06 0.13
B_sex
    male
    female 1.61 0.45, 5.90 0.5
B_age 1.01 0.96, 1.06 0.7
1 OR = Odds Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether totmet_hpwk improves the model fit at all (compare a model with splines for totmet_hpwk (Model 1) to a model without totmet_hpwk (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + 
##     B_age
## Model 2: T1_gr3_1y ~ B_sex + B_age
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        44     63.846                     
## 2        46     66.469 -2  -2.6225   0.2695

Than, we check whether using splines for totmet_hpwk gives a better fit than adding it as a linear variable (compare a model with splines for totmet_hpwk (Model 1) to a model with totmet_hpwk as is (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)+totmet_hpwk), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + 
##     B_age
## Model 2: T1_gr3_1y ~ B_sex + B_age + totmet_hpwk
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        44     63.846                     
## 2        45     66.455 -1  -2.6088   0.1063

Make dummy database.

newdf <- tox_cICI

newdf$totmet_hpwk <- c(seq(min(tox_cICI$totmet_hpwk),max(tox_cICI$totmet_hpwk),length=(nrow(newdf)-length(k.pos.totmet_hpwk))), k.pos.totmet_hpwk)

Now we make a model matrix, which is needed in the subsequent steps.

mm <- model.matrix(glm_rcs.adj, data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("totmet_hpwk",colnames(mm))]] <- 0 # set all variables not about totmet_hpwk to 0

For each value of our sequence of totmet_hpwk, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(totmet_hpwk=newdf$totmet_hpwk,eta=mm%*%coef(glm_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(glm_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain OR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for totmet_hpwk) and sort on totmet_hpwk.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$totmet_hpwk),] #sort based on var1 order

4.2.4.1 Make splines plot

p_splines_tox.OR_MET_cICI <- ggplot(data = est
                )+ geom_ribbon(aes(x=totmet_hpwk, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                , aes(x=totmet_hpwk, y=eta)
                )+ geom_rug(data = tox_cICI, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.01,seq(0,1,.2)/10,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.01,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("total PA (MET-hours/week)" #add x-axis label
                )+ ylab("Odds ratio for severe irAE" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_tox.OR_MET_cICI

4.2.5 Visualize predicted probabilities of ficticious patient

Alternatively, we can visualize the predicted probabilities given a patient in whom only the variable of interest changes.

newdf <- with(tox_cICI,
              expand.grid(totmet_hpwk=c(seq(min(totmet_hpwk),max(totmet_hpwk),length=300), k.pos.totmet_hpwk)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = round(median(B_age), digits = 0)
                          ))
newdf$prob <- predict(glm_rcs.adj, newdata = newdf, type = "response")
p_splines_tox.prob_MET_cICI <- ggplot(data = newdf
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_hline(yintercept=0 #add horizontal line at 0
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=prob) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=newdf[newdf$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                , aes(x=totmet_hpwk, y=prob)
                )+ geom_rug(data = tox_cICI, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.0,1.0) #set y-axis
                )+ labs(x = "total PA (MET-hours/week)" #add x-axis label
                        ,y ="Probability of  severe irAE" #add y-axis label
                        ,title = "probability of SirAE for a fictitious patient with differing total PA"
                        ,caption = "62 year old male patient treated with cICI"
                )+ theme_bw( #add/alter theme
                )

p_splines_tox.prob_MET_cICI

4.3 MVPA-SL and severe irAEs

4.3.1 SirAEs~MVPA-SL logistic regression - tertiles

Adjusted logistic regression for tertiles variable.

glm_tert.adj <- glm(formula = T1_gr3_1y ~ MZvt_HWK_tertiles + B_sex + B_age, 
                    family = binomial(link = "logit"), 
                    data = tox_cICI)
tbl_regression(glm_tert.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.58 0.14, 2.26 0.4
    (6.5,38.5] 0.31 0.06, 1.41 0.14
B_sex
    male
    female 1.34 0.37, 4.92 0.7
B_age 1.02 0.97, 1.07 0.4
1 OR = Odds Ratio, CI = Confidence Interval

4.3.2 SirAEs~MVPA-SL logistic regression - continuous linear

Adjusted logistic regression for continuous linear variable.

glm_lin.adj <- glm(formula = T1_gr3_1y ~ MZvt_HWK + B_sex + B_age, 
                   family = binomial(link = "logit"), 
                   data = tox_cICI)
tbl_regression(glm_lin.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
MZvt_HWK 0.95 0.85, 1.05 0.4
B_sex
    male
    female 1.48 0.42, 5.32 0.5
B_age 1.02 0.97, 1.07 0.5
1 OR = Odds Ratio, CI = Confidence Interval

###SirAEs~MVPA-SL logistic regression - assumptions

Assess linearity of MZvt_HWK and B_age with outcome.

fit <- glm(formula = T1_gr3_1y ~ MZvt_HWK , family = binomial(link = "logit"), data = tox_cICI)
residual_lp_binned(tox_cICI, fit = fit, title = "MZvt_HWK")

fit <- glm(formula = T1_gr3_1y ~ B_age , family = binomial(link = "logit"), data = tox_cICI)
residual_lp_binned(tox_cICI, fit = fit, title = "age")

Assess whether residuals are normally distributed.

plot(glm_lin.adj, which = 2)

Check homoscedasticity.

residual_lp_binned(tox_cICI, glm_lin.adj)

residual_lp_binned(tox_cICI, glm_tert.adj)

4.3.3 SirAEs~MVPA-SL logistic regression - continuous splines

Initial regression.

glm_rcs.adj <- glm(formula = T1_gr3_1y ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age, 
                   family = binomial(link = "logit"), 
                   data = tox_cICI)
tbl_regression(glm_rcs.adj, exponentiate=T)
Characteristic OR1 95% CI1 p-value
MZvt_HWK
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK 0.92 0.67, 1.28 0.6
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK' 1.06 0.57, 1.89 0.8
B_sex
    male
    female 1.46 0.41, 5.28 0.6
B_age 1.02 0.97, 1.07 0.5
1 OR = Odds Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether MZvt_HWK improves the model fit at all (compare a model with splines for MZvt_HWK (Model 1) to a model without MZvt_HWK (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age
## Model 2: T1_gr3_1y ~ B_sex + B_age
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        44     65.557                     
## 2        46     66.469 -2 -0.91148    0.634

Than, we check whether using splines for MZvt_HWK gives a better fit than adding it as a linear variable (compare a model with splines for MZvt_HWK (Model 1) to a model with MZvt_HWK as is (Model 2)).

anova(glm_rcs.adj,update(glm_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)+MZvt_HWK), test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: T1_gr3_1y ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age
## Model 2: T1_gr3_1y ~ B_sex + B_age + MZvt_HWK
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        44     65.557                     
## 2        45     65.598 -1 -0.04129    0.839

Make dummy dataset

newdf <- tox_cICI

newdf$MZvt_HWK <- c(seq(min(tox_cICI$MZvt_HWK),max(tox_cICI$MZvt_HWK),length=(nrow(newdf)-length(k.pos.MZvt_HWK))), k.pos.MZvt_HWK)

Now we make a model matrix, which is needed in the subsequent steps.

mm <- model.matrix(glm_rcs.adj, data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("MZvt_HWK",colnames(mm))]] <- 0 # set all variables not about MZvt_HWK to 0

For each value of our sequence of MZvt_HWK, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(MZvt_HWK=newdf$MZvt_HWK,eta=mm%*%coef(glm_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(glm_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain OR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for MZvt_HWK) and sort on MZvt_HWK.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$MZvt_HWK),] #sort based on var1 order

4.3.3.1 Make splines plot

p_splines_tox.OR_MVPASL_cICI <- ggplot(data = est
                )+ geom_ribbon(aes(x=MZvt_HWK, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                , aes(x=MZvt_HWK, y=eta)
                )+ geom_rug(data = tox_cICI, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.01,seq(0,1,.2)/10,seq(0,1,.2), seq(0,1,.2)*10))
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.01,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("MVPA-SL (hours/week)" #add x-axis label
                )+ ylab("Odds ratio for severe irAE" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_tox.OR_MVPASL_cICI

4.3.4 Visualize predicted probabilities of ficticious patient

Alternatively, we can visualize the predicted probabilities given a patient in whom only the variable of interest changes.

newdf <- with(tox_cICI,
              expand.grid(MZvt_HWK=c(seq(min(MZvt_HWK),max(MZvt_HWK),length=300), k.pos.MZvt_HWK)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = round(median(B_age), digits = 0)
                          ))
median(tox_cICI$B_age)
## [1] 62
newdf$prob <- predict(glm_rcs.adj, newdata = newdf, type = "response")
p_splines_tox.prob_MVPASL_cICI <- ggplot(data = newdf
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_hline(yintercept=0 #add horizontal line at 0
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=prob) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=newdf[newdf$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                , aes(x=MZvt_HWK, y=prob)
                )+ geom_rug(data = tox_cICI, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.0,1.0) #set y-axis
                )+ labs(x = "MVPA-SL (hours/week)" #add x-axis label
                        ,y ="Probability of  severe irAE" #add y-axis label
                        ,title = "probability of SirAE for a fictitious patient with differing MVPA-SL"
                        ,caption = "62 year old male patient treated with cICI"
                )+ theme_bw( #add/alter theme
                )

p_splines_tox.prob_MVPASL_cICI

4.4 Combine plots

Make Supplementary Figure 2.

suppl_figure2 <- ggarrange(p_splines_tox.OR_MET_cICI, p_splines_tox.OR_MVPASL_cICI
                     ,labels = c("A","B")
                     ,ncol=2, nrow=1
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure2

Save as PDF.


5 SirAE~PA - Competing risk

Death might be a competing risk for irAE occurrence. In other words: patients who die early might not have survived long enough to have an irAE although they would have had one if they would have lived longer. The proportional subdistribution hazard model as described by Fine and Gray (FG), infers the rate of occurrence of the event (irAE) in patients who have not yet experienced that specific event, including patients who have experienced the competing event (death). While the Fine and Gray model provides us a Hazard Ratio (HR), we can obtain a graphical representation by plotting the cumulative incidence function.

We will use the data frame data_OS: data from all SQUASH respondents with survival data.

For competing risk analyses, a variable is needed reflecting all possible outcomes: - censor =censored; - SirAE =developed severe irAE; - death =died before developing a severe irAE (in our case all deaths were caused by disease progression).

5.1 SirAE~MET Cumulative incidence function - tertiles

Estimate the cumulative incidence in the context of competing risks using the cuminc() function from the tidycmprsk package, which is a wrapper around the cmprsk package by Gray. We can now obtain the cumulative incidence at a specific time point. To visualize the cumulative incidence function, we will use the ggcumic() function from the ggsurvfit package.

cif_tert.crude_MET <- tidycmprsk::cuminc(Surv(T1_gr3_days, T1_gr3_status) ~ totmet_hpwk_tertiles, data = data_OS)

tidycmprsk::tbl_cuminc(cif_tert.crude_MET,
                       times = 12*30.4375,
                       outcomes = c("SirAE" , "death"),
                       label_header = "1-year cuminc"
                       ) #%>%
Characteristic 1-year cuminc
SirAE
totmet_hpwk_tertiles
    [0,51] 30% (20%, 40%)
    (51,101] 13% (7.1%, 22%)
    (101,371] 13% (7.0%, 21%)
death
totmet_hpwk_tertiles
    [0,51] 28% (19%, 39%)
    (51,101] 12% (5.5%, 21%)
    (101,371] 11% (4.9%, 19%)
  # add_p() #to add p-value


cuminc_MET <- ggcuminc(cif_tert.crude_MET,
                       outcome = c("death", "SirAE")
                       ,theme = list(theme_ggsurvfit_default(),theme(legend.position = "right"))
                       # ) + add_confidence_interval(
                       ) + add_risktable(risktable_stats = "n.risk"
                       ) + labs(x = "Time (months)"
                                ,color = "total PA"
                                ,linetype = "outcome"
                       ) + coord_cartesian(ylim = c(0,0.5)
                       )
cuminc_MET

5.2 SirAE~MET FG - tertiles

For the subdistribution hazard regression, we will use the crr() function from the tidycmprsk package.

crr_tert.adj_MET <- tidycmprsk::crr(Surv(T1_gr3_days, T1_gr3_status) ~ totmet_hpwk_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, 
                                failcode = "SirAE",
                                data = data_OS)
tbl_regression(crr_tert.adj_MET, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.47 0.21, 1.05 0.066
    (101,371] 0.43 0.21, 0.90 0.024
B_sex
    male
    female 1.30 0.65, 2.61 0.5
B_age 1.01 0.98, 1.04 0.4
B_tumor_type.1
    melanoma
    NSCLC 1.34 0.28, 6.40 0.7
    RCC 0.53 0.21, 1.32 0.2
    other 0.78 0.19, 3.27 0.7
B_paladj
    palliative
    adjuvant 0.60 0.15, 2.36 0.5
B_ther_prev
    no
    yes 0.35 0.11, 1.14 0.081
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 8.03 2.74, 23.5 <0.001
    ICI + chemo/targeted therapy 1.63 0.50, 5.32 0.4
1 HR = Hazard Ratio, CI = Confidence Interval

5.3 SirAE~MVPA-SL Cumulative incidence function - tertiles

Estimate the cumulative incidence in the context of competing risks using the cuminc() function from the tidycmprsk package, which is a wrapper around the cmprsk package by Gray. We can now obtain the cumulative incidence at a specific time point. To visualize the cumulative incidence function, we will use the ggcumic() function from the ggsurvfit package.

cif_tert.crude_MVPASL <- tidycmprsk::cuminc(Surv(T1_gr3_days, T1_gr3_status) ~ MZvt_HWK_tertiles, data = data_OS)

tidycmprsk::tbl_cuminc(cif_tert.crude_MVPASL,
                       times = 12*30.4375,
                       outcomes = c("SirAE" , "death"),
                       label_header = "1-year cuminc"
                       ) #%>%
Characteristic 1-year cuminc
SirAE
MZvt_HWK_tertiles
    [0,1.5] 29% (20%, 39%)
    (1.5,6.5] 15% (8.1%, 24%)
    (6.5,38.5] 11% (5.6%, 20%)
death
MZvt_HWK_tertiles
    [0,1.5] 27% (17%, 37%)
    (1.5,6.5] 14% (6.9%, 23%)
    (6.5,38.5] 9.9% (4.3%, 18%)
  # add_p() #to add p-value


cuminc_MVPASL <- ggcuminc(cif_tert.crude_MVPASL,
                          outcome = c("death", "SirAE")
                          ,theme = list(theme_ggsurvfit_default(),theme(legend.position = "right"))
                          # ) + add_confidence_interval(
                          ) + add_risktable(risktable_stats = "n.risk"
                          ) + labs(x = "Time (months)"
                                   ,color = "MVPA-SL"
                                   ,linetype = "outcome"
                          ) + coord_cartesian(ylim = c(0,0.5)
                          )
cuminc_MVPASL

5.4 SirAE~MVPA-SL FG - tertiles

For the subdistribution hazard regression, we will use the crr() function from the tidycmprsk package.

crr_tert.adj_MVPASL <- tidycmprsk::crr(Surv(T1_gr3_days, T1_gr3_status) ~ MZvt_HWK_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, 
                                failcode = "SirAE",
                                data = data_OS)
tbl_regression(crr_tert.adj_MVPASL, exponentiate=T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.60 0.29, 1.27 0.2
    (6.5,38.5] 0.38 0.17, 0.88 0.025
B_sex
    male
    female 1.13 0.58, 2.22 0.7
B_age 1.02 0.99, 1.05 0.2
B_tumor_type.1
    melanoma
    NSCLC 1.33 0.25, 7.19 0.7
    RCC 0.54 0.22, 1.37 0.2
    other 0.78 0.18, 3.43 0.7
B_paladj
    palliative
    adjuvant 0.60 0.14, 2.46 0.5
B_ther_prev
    no
    yes 0.38 0.12, 1.25 0.11
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 7.69 2.52, 23.5 <0.001
    ICI + chemo/targeted therapy 1.71 0.48, 6.08 0.4
1 HR = Hazard Ratio, CI = Confidence Interval

5.5 Combine cumulative incidence plots

Make Supplementary Figure 3.

suppl_figure3 <- ggarrange(cuminc_MET, cuminc_MVPASL
                     ,labels = c("A","B")
                     ,ncol=2, nrow=1
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure3

Save as PDF.

6 OS~PA - irresectable subgroup

Make OS_pal.

OS_pal <- data_OS[data_OS$B_paladj%in%"palliative",]

6.1 Baseline characteristics

Specify data frame to use

data <- OS_pal

Relabel variables and add units that you want to use in your table (all formatting to let the table look nicely):

label(data$B_sex) <- "sex"
label(data$B_age) <- "age"
units(data$B_age) <- "years"
label(data$B_PS) <- "ECOG performance status"
label(data$B_tumor_type.1) <- "tumor type"
label(data$B_tumor_stage_dich) <- "stage"
label(data$B_paladj) <- "treatment intent"
label(data$B_ther_prev) <- "previous systemic therapy"
label(data$B_ther_cur_regrouped.1) <- "therapy"
label(data$T1_gr3_1y) <- "severe irAE within 1 year"
label(data$totmet_hpwk_tertiles) <- "total PA"
label(data$totmet_hpwk) <- "total PA"
units(data$totmet_hpwk) <- "hours/week"
label(data$MZvt_HWK) <- "MVPA-SL"
units(data$MZvt_HWK) <- "hours/week"

Make the actual table:

  • ~ B_sex + B_age + ... + B_ther_cur_regrouped.1 defines the rows;
  • | totmet_hpwk_tertiles defines the columns;
  • the rest are just to change the appearance of the table.
table1(~ totmet_hpwk + MZvt_HWK + B_sex + B_age + B_PS + B_tumor_type.1 + B_tumor_stage_dich + B_paladj + B_ther_prev + B_ther_cur_regrouped.1| totmet_hpwk_tertiles, 
       data = data, 
       overall = "Overall", 
       # caption = "Table 1: baseline characteristics of combined ipilimumab",
       # footnote = "RCC: renal cell carcinoma",
       topclass = "Rtable1-zebra",
       render.categorical = my.render.cat,
       render.continuous = my.render.cont,#c(.="median [Q1, Q3]"),
       render.missing = my.render.missing,
       droplevels = TRUE)
[0,51]
(N=62)
(51,101]
(N=46)
(101,371]
(N=42)
Overall
(N=150)
total PA (hours/week)
median [Q1-Q3] 30.0 [15.0-39.1] 73.0 [62.2-82.6] 122.9 [114.5-160.4] 62.6 [35.2-107.3]
mean (SD) 26.8 (14.4) 73.1 (13.3) 143.1 (49.6) 73.5 (55.6)
MVPA-SL (hours/week)
median [Q1-Q3] 0.5 [0.0-2.2] 5.0 [1.1-8.0] 7.1 [4.1-13.4] 3.1 [0.0-7.0]
mean (SD) 1.5 (2.2) 5.3 (4.7) 9.4 (6.8) 4.9 (5.6)
sex
male 42 (67.7%) 31 (67.4%) 24 (57.1%) 97 (64.7%)
female 20 (32.3%) 15 (32.6%) 18 (42.9%) 53 (35.3%)
age (years)
median [Q1-Q3] 67.0 [59.5-74.8] 64.5 [54.0-74.0] 62.5 [58.0-69.8] 65.5 [57.0-74.0]
mean (SD) 65.4 (13.0) 62.4 (15.2) 64.0 (10.3) 64.1 (13.0)
ECOG performance status
0 18 (29.0%) 15 (32.6%) 22 (52.4%) 55 (36.7%)
1 30 (48.4%) 29 (63.0%) 16 (38.1%) 75 (50.0%)
2 11 (17.7%) 1 (2.2%) 3 (7.1%) 15 (10.0%)
3 2 (3.2%) 0 (0.0%) 0 (0.0%) 2 (1.3%)
4 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
unknown 1 (1.6%) 1 (2.2%) 1 (2.4%) 3 (2.0%)
tumor type
melanoma 24 (38.7%) 19 (41.3%) 20 (47.6%) 63 (42.0%)
NSCLC 13 (21.0%) 7 (15.2%) 9 (21.4%) 29 (19.3%)
RCC 10 (16.1%) 9 (19.6%) 9 (21.4%) 28 (18.7%)
other 15 (24.2%) 11 (23.9%) 4 (9.5%) 30 (20.0%)
stage
III 4 (6.5%) 6 (13.0%) 4 (9.5%) 14 (9.3%)
IV 57 (91.9%) 39 (84.8%) 38 (90.5%) 134 (89.3%)
other 1 (1.6%) 1 (2.2%) 0 (0.0%) 2 (1.3%)
treatment intent
palliative 62 (100.0%) 46 (100.0%) 42 (100.0%) 150 (100.0%)
adjuvant 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
previous systemic therapy
no 45 (72.6%) 37 (80.4%) 35 (83.3%) 117 (78.0%)
yes 17 (27.4%) 9 (19.6%) 7 (16.7%) 33 (22.0%)
therapy
anti-PD-(L)1 30 (48.4%) 23 (50.0%) 11 (26.2%) 64 (42.7%)
cICI 21 (33.9%) 18 (39.1%) 22 (52.4%) 61 (40.7%)
ICI + chemo/targeted therapy 11 (17.7%) 5 (10.9%) 9 (21.4%) 25 (16.7%)

Indeed we selected only palliative patients.

6.2 OS~MET irresectable subgroup

To confirm the results for overall survival we will perform the Cox proportional hazard regression and KM curves, to asses the correlation of total MET hours per week (totmet_hpwk_tertiles) and MVPA-SL (MZvt_HWK_tertiles), in different functional forms - categorial: tertiles by sample size of overall dataset - continuous: splines to see whether shape is similar to all patients

6.2.1 OS~MET irresectable subgroup Kaplan Meier curves - tertiles

We first flip the order of levels of totmet_hpwk_tertiles, since it appears that patients with highest values have better survival and thus it is more intuitive.

OS_pal$totmet_hpwk_tertiles_KM <- factor(OS_pal$totmet_hpwk_tertiles
                                          , levels = rev(levels(OS_pal$totmet_hpwk_tertiles)))
fit_tert.crude <- survfit(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles_KM , 
                          data = OS_pal)
KM_OS_MET_pal <- ggsurvplot(fit_tert.crude
                     ,conf.int = F
                     ,pval = F
                     ,xscale="d_m"
                     ,break.x.by=30.4375*6
                     # ,title=""
                     ,xlab="Time (months)"
                     ,ylab="Survival probability"
                     ,legend = c(.26,.24)
                     ,legend.title="total PA"
                     ,legend.labs=c(paste0("high ",
                                           levels(OS_pal$totmet_hpwk_tertiles_KM)[1]), 
                                    paste0("moderate ",
                                           levels(OS_pal$totmet_hpwk_tertiles_KM)[2]),
                                    paste0("low ",
                                           levels(OS_pal$totmet_hpwk_tertiles_KM)[3]))
                     ,axes.offset=F
                     ,surv.median.line = "none"
                     ,consor=T
                     ,censor.shape=124
                     ,censor.size=2
                     ,risk.table = T
                     ,risk.table.y.text=F
                     ,fontsize=3.5
                     ,ggtheme = theme_survminer(font.x = 12, font.y=12, font.main=12, font.tickslab=10, font.legend=10)
                     ,tables.theme = theme_cleantable(base_size = 10, font.x=10, font.y=10, font.main=12)
          )
KM_OS_MET_pal
KM_OS_MET_pal$plot <- KM_OS_MET_pal$plot +
  theme(legend.background = element_rect(fill=alpha("white", alpha=0)))

6.2.2 OS~MET irresectable subgroup Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1, 
                      data = OS_pal)
tbl_regression(cox_tert.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.55 0.29, 1.04 0.066
    (101,371] 0.39 0.19, 0.81 0.011
B_sex
    male
    female 1.12 0.64, 1.93 0.7
B_age 0.99 0.97, 1.01 0.4
B_ther_prev
    no
    yes 1.92 1.04, 3.53 0.036
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.01 0.51, 2.03 >0.9
    ICI + chemo/targeted therapy 1.94 1.01, 3.74 0.048
1 HR = Hazard Ratio, CI = Confidence Interval

6.2.3 OS~MET irresectable subgroup Cox regression - splines

Run initial regression.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1,
                     data = OS_pal)

tbl_regression(cox_rcs.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk 0.98 0.97, 0.99 0.004
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk' 1.02 1.00, 1.03 0.048
B_sex
    male
    female 1.15 0.66, 2.00 0.6
B_age 0.99 0.97, 1.01 0.3
B_ther_prev
    no
    yes 2.00 1.08, 3.72 0.028
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.03 0.52, 2.06 >0.9
    ICI + chemo/targeted therapy 2.03 1.06, 3.89 0.033
1 HR = Hazard Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether totmet_hpwk improves the model fit at all (compare a model with splines for totmet_hpwk (Model 1) to a model without totmet_hpwk (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1
##    loglik  Chisq Df Pr(>|Chi|)   
## 1 -243.22                        
## 2 -248.33 10.227  2   0.006014 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for totmet_hpwk gives a better fit than adding it as a linear variable (compare a model with splines for totmet_hpwk (Model 1) to a model with totmet_hpwk as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)+totmet_hpwk))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1 + totmet_hpwk
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -243.22                       
## 2 -244.80 3.1687  1    0.07506 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thus, using restricted cubic splines for totmet_hpwk does NOT improve our model fit..

Make dummy database.

newdf <- with(OS_pal,
              expand.grid(totmet_hpwk=c(seq(min(totmet_hpwk),max(totmet_hpwk),length=300), k.pos.totmet_hpwk)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          ))

Now we make a model matrix (= some technical stuff needed for next step).

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("totmet_hpwk",colnames(mm))]] <- 0 # set all variables not about totmet_hpwk to 0

For each value of our sequence of totmet_hpwk, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% confidence intervals (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(totmet_hpwk=newdf$totmet_hpwk,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for totmet_hpwk) and sort on totmet_hpwk.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$totmet_hpwk),] #sort based on var1 order

6.2.3.1 Make splines plot

Now, we can finally make the actual plot.

p_splines_OS_MET_pal <- ggplot(data = est
                )+ geom_ribbon(aes(x=totmet_hpwk, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                              , aes(x=totmet_hpwk, y=eta)
                )+ geom_rug(data = OS_pal, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("total PA (MET-hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_OS_MET_pal

6.3 OS~MVPASL irresectable subgroup

6.3.1 OS~MVPASL irresectable subgroup Kaplan Meier curves - tertiles

Kaplan Meier curve for tertiles (=unadjusted)

OS_pal$MZvt_HWK_tertiles_KM <- factor(OS_pal$MZvt_HWK_tertiles
                                          , levels = rev(levels(OS_pal$MZvt_HWK_tertiles)))
fit_tert.crude <- survfit(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles_KM , 
                          data = OS_pal)
KM_OS_MVPASL_pal <- ggsurvplot(fit_tert.crude
                     ,conf.int = F
                     ,pval = F
                     ,xscale="d_m"
                     ,break.x.by=30.4375*6
                     # ,title=""
                     ,xlab="Time (months)"
                     ,ylab="Survival probability"
                     ,legend = c(.26,.24)
                     ,legend.title="MVPA-SL"
                     ,legend.labs=c(paste0("high ",
                                           levels(OS_pal$MZvt_HWK_tertiles_KM)[1]), 
                                    paste0("moderate ",
                                           levels(OS_pal$MZvt_HWK_tertiles_KM)[2]),
                                    paste0("low ",
                                           levels(OS_pal$MZvt_HWK_tertiles_KM)[3]))
                     ,axes.offset=F
                     ,surv.median.line = "none"
                     ,consor=T
                     ,censor.shape=124
                     ,censor.size=2
                     ,risk.table = T
                     ,risk.table.y.text=F
                     ,fontsize=3.5
                     ,ggtheme = theme_survminer(font.x = 12, font.y=12, font.main=12, font.tickslab=10, font.legend=10)
                     ,tables.theme = theme_cleantable(base_size = 10, font.x=10, font.y=10, font.main=12)
          )
KM_OS_MVPASL_pal
KM_OS_MVPASL_pal$plot <- KM_OS_MVPASL_pal$plot +
  theme(legend.background = element_rect(fill=alpha("white", alpha=0)))

6.3.2 OS~MVPASL irresectable subgroup Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1,
                      data = OS_pal)
tbl_regression(cox_tert.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.80 0.45, 1.42 0.4
    (6.5,38.5] 0.37 0.16, 0.87 0.022
B_sex
    male
    female 0.97 0.56, 1.68 >0.9
B_age 1.00 0.98, 1.02 0.8
B_ther_prev
    no
    yes 2.02 1.10, 3.73 0.024
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.00 0.51, 1.99 >0.9
    ICI + chemo/targeted therapy 1.88 0.98, 3.58 0.056
1 HR = Hazard Ratio, CI = Confidence Interval

6.3.3 OS~MVPASL irresectable subgroup Cox regression - splines

Initial regression.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1,
                     data = OS_pal)
tbl_regression(cox_rcs.adj, exponentiate = T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK 0.87 0.75, 1.02 0.092
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK' 1.11 0.78, 1.59 0.5
B_sex
    male
    female 0.95 0.55, 1.65 0.9
B_age 1.00 0.97, 1.02 0.7
B_ther_prev
    no
    yes 1.97 1.07, 3.61 0.029
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 0.97 0.49, 1.91 >0.9
    ICI + chemo/targeted therapy 1.82 0.96, 3.47 0.068
1 HR = Hazard Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether MZvt_HWK improves the model fit at all (compare a model with splines for MZvt_HWK (Model 1) to a model without MZvt_HWK (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -244.09                       
## 2 -248.33 8.4855  2    0.01437 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for MZvt_HWK gives a better fit than adding it as a linear variable (compare a model with splines for MZvt_HWK (Model 1) to a model with MZvt_HWK as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)+MZvt_HWK))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_ther_prev + B_ther_cur_regrouped.1 + MZvt_HWK
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -244.09                     
## 2 -244.26 0.3365  1     0.5618

Make dummy dataset.

newdf <- with(OS_pal,
              expand.grid(MZvt_HWK=c(seq(min(MZvt_HWK),max(MZvt_HWK),length=200), k.pos.MZvt_HWK)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          ))

Now we make a model matrix (= some technical stuff needed for next step).

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("MZvt_HWK",colnames(mm))]] <- 0 # set all variables not about MZvt_HWK to 0

For each value of our sequence of MZvt_HWK, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(MZvt_HWK=newdf$MZvt_HWK,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for MZvt_HWK) and sort on MZvt_HWK.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$MZvt_HWK),] #sort based on var1 order

6.3.3.1 Make splines plot

p_splines_OS_MVPASL_pal <- ggplot(data = est
                )+ geom_ribbon(aes(x=MZvt_HWK, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                              , aes(x=MZvt_HWK, y=eta)
                )+ geom_rug(data = OS_pal, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .5
                )+ scale_x_continuous(expand = c(0,0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 30) #set x-axis
                            ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("MVPA-SL (hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )
p_splines_OS_MVPASL_pal

6.4 Combine plots

Make Supplementary Figure 5.

KM_OS_MET_pal <- ggarrange(KM_OS_MET_pal$plot,KM_OS_MET_pal$table
                       , ncol=1, nrow=2
                       , heights= c(.75,.25))
KM_OS_MVPASL_pal <- ggarrange(KM_OS_MVPASL_pal$plot,KM_OS_MVPASL_pal$table
                       , ncol=1, nrow=2
                       , heights= c(.75,.25))

suppl_figure4 <- ggarrange(KM_OS_MET_pal, p_splines_OS_MET_pal
                     ,KM_OS_MVPASL_pal, p_splines_OS_MVPASL_pal
                     ,labels = c("A","B","C","D")
                     ,ncol=2, nrow=2
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure4

Save as PDF.


7 OS~PA - melanoma subgroup

Make OS_melanoma.

summary(data_OS$B_lab_LDH)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    94.0   173.8   196.0   218.8   226.5  1124.0       3
table(data_OS$B_tumor_type.1, useNA = "always")
## 
## melanoma    NSCLC      RCC    other     <NA> 
##      153       38       28       32        0
OS_melanoma <- data_OS[data_OS$B_tumor_type.1 %in% "melanoma",]
summary(OS_melanoma$B_tumor_type.1)
## melanoma    NSCLC      RCC    other 
##      153        0        0        0
table(OS_melanoma$B_ther_cur_regrouped.1, useNA = "always")
## 
##                 anti-PD-(L)1                         cICI 
##                          119                           34 
## ICI + chemo/targeted therapy                         <NA> 
##                            0                            0
OS_melanoma$B_ther_cur_regrouped.1 <- droplevels(OS_melanoma$B_ther_cur_regrouped.1)
OS_melanoma[is.na(OS_melanoma$B_lab_LDH),c("record_id","B_lab_LDH")]
## [1] record_id B_lab_LDH
## <0 rows> (or 0-length row.names)
hist(OS_melanoma$B_lab_LDH)

# OS_melanoma[OS_melanoma$B_lab_LDH=="1124",]$record_id #checked --> correct

We checked the outlier with LDH of 1124U/L in electronic patient files and it is correct.

7.1 Baseline characteristics

Specify data frame to use

OS_melanoma$B_lab_LDH_ULN<- cut(OS_melanoma$B_lab_LDH, breaks = c(0, 250, 500, (max(OS_melanoma$B_lab_LDH)+1)),
                        labels = c("<1x ULN", "1-2x ULN", ">2x ULN"))
table(OS_melanoma$B_lab_LDH_ULN, useNA = "always")
## 
##  <1x ULN 1-2x ULN  >2x ULN     <NA> 
##      136       14        3        0
data <- OS_melanoma

Relabel variables and add units that you want to use in your table (all formatting to let the table look nicely):

label(data$B_sex) <- "sex"
label(data$B_age) <- "age"
units(data$B_age) <- "years"
label(data$B_PS) <- "ECOG performance status"
label(data$B_tumor_type.1) <- "tumor type"
label(data$B_tumor_stage_dich) <- "stage"
label(data$B_paladj) <- "treatment intent"
label(data$B_ther_prev) <- "previous systemic therapy"
label(data$B_ther_cur_regrouped.1) <- "therapy"
label(data$T1_gr3_1y) <- "severe irAE within 1 year"
label(data$B_lab_LDH) <- "lactate dehydrogenase"
label (data$B_lab_LDH_ULN) <- "lactate dehydrogenase"
label(data$totmet_hpwk_tertiles) <- "MET-hours per week"
label(data$totmet_hpwk) <- "MET-hours"
units(data$totmet_hpwk) <- "hours/week"
label(data$MZvt_HWK) <- "MVPA-SL"
units(data$MZvt_HWK) <- "hours/week"

Make the actual table:

  • ~ B_sex + B_age + ... + B_ther_cur_regrouped.1 defines the rows;
  • | totmet_hpwk_tertiles defines the columns;
  • the rest are just to change the appearance of the table.
table1(~ totmet_hpwk + MZvt_HWK + B_sex + B_age + B_PS + B_tumor_type.1 + B_tumor_stage_dich + B_lab_LDH_ULN + B_paladj + B_ther_prev + B_ther_cur_regrouped.1| totmet_hpwk_tertiles, 
       data = data, 
       overall = "Overall", 
       # caption = "Table 1: baseline characteristics of combined ipilimumab",
       # footnote = "RCC: renal cell carcinoma",
       topclass = "Rtable1-zebra",
       render.categorical = my.render.cat,
       render.continuous = my.render.cont,#c(.="median [Q1, Q3]"),
       render.missing = my.render.missing,
       droplevels = TRUE)
[0,51]
(N=40)
(51,101]
(N=54)
(101,371]
(N=59)
Overall
(N=153)
MET-hours (hours/week)
median [Q1-Q3] 38.2 [28.0-41.7] 75.4 [62.0-87.0] 132.3 [112.0-171.7] 83.8 [47.3-119.8]
mean (SD) 32.6 (12.9) 75.4 (13.8) 151.8 (56.1) 93.7 (61.0)
MVPA-SL (hours/week)
median [Q1-Q3] 1.0 [0.0-3.5] 4.6 [1.5-6.5] 10.0 [5.0-17.7] 4.7 [1.0-9.5]
mean (SD) 1.8 (2.3) 4.9 (4.1) 11.3 (8.4) 6.5 (7.1)
sex
male 20 (50.0%) 40 (74.1%) 37 (62.7%) 97 (63.4%)
female 20 (50.0%) 14 (25.9%) 22 (37.3%) 56 (36.6%)
age (years)
median [Q1-Q3] 67.5 [59.8-72.3] 64.0 [52.5-73.0] 62.0 [54.5-71.5] 64.0 [54.0-73.0]
mean (SD) 63.1 (14.9) 62.2 (13.0) 61.5 (12.2) 62.2 (13.2)
ECOG performance status
0 18 (45.0%) 30 (55.6%) 42 (71.2%) 90 (58.8%)
1 20 (50.0%) 21 (38.9%) 13 (22.0%) 54 (35.3%)
2 2 (5.0%) 1 (1.9%) 2 (3.4%) 5 (3.3%)
3 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
4 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
unknown 0 (0.0%) 2 (3.7%) 2 (3.4%) 4 (2.6%)
tumor type
melanoma 40 (100.0%) 54 (100.0%) 59 (100.0%) 153 (100.0%)
NSCLC 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
RCC 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
other 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
stage
III 17 (42.5%) 33 (61.1%) 38 (64.4%) 88 (57.5%)
IV 22 (55.0%) 21 (38.9%) 21 (35.6%) 64 (41.8%)
other 1 (2.5%) 0 (0.0%) 0 (0.0%) 1 (0.7%)
lactate dehydrogenase
<1x ULN 31 (77.5%) 53 (98.1%) 52 (88.1%) 136 (88.9%)
1-2x ULN 8 (20.0%) 1 (1.9%) 5 (8.5%) 14 (9.2%)
>2x ULN 1 (2.5%) 0 (0.0%) 2 (3.4%) 3 (2.0%)
treatment intent
palliative 24 (60.0%) 19 (35.2%) 20 (33.9%) 63 (41.2%)
adjuvant 16 (40.0%) 35 (64.8%) 39 (66.1%) 90 (58.8%)
previous systemic therapy
no 37 (92.5%) 52 (96.3%) 55 (93.2%) 144 (94.1%)
yes 3 (7.5%) 2 (3.7%) 4 (6.8%) 9 (5.9%)
therapy
anti-PD-(L)1 28 (70.0%) 45 (83.3%) 46 (78.0%) 119 (77.8%)
cICI 12 (30.0%) 9 (16.7%) 13 (22.0%) 34 (22.2%)

Indeed we selected only melanoma patients.

7.2 OS~MET melanoma subgroup

To confirm the results for overall survival we will perform the Cox proportional hazard regression and KM curves, to asses the correlation of total MET hours per week (totmet_hpwk_tertiles) and MVPA-SL (MZvt_HWK_tertiles), in different functional forms - categorial: tertiles by sample size of overall dataset - continuous: splines to see whether shape is similar to all patients

7.2.1 OS~MET melanoma subgroup Kaplan Meier curves - tertiles

We first flip the order of levels of totmet_hpwk_tertiles, since it appears that patients with highest values have better survival and thus it is more intuitive.

OS_melanoma$totmet_hpwk_tertiles_KM <- factor(OS_melanoma$totmet_hpwk_tertiles
                                          , levels = rev(levels(OS_melanoma$totmet_hpwk_tertiles)))
fit_tert.crude <- survfit(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles_KM , 
                          data = OS_melanoma)
KM_OS_MET_mel <- ggsurvplot(fit_tert.crude
                     ,conf.int = F
                     ,pval = F
                     ,xscale="d_m"
                     ,break.x.by=30.4375*6
                     # ,title=""
                     ,xlab="Time (months)"
                     ,ylab="Survival probability"
                     ,legend = c(.26,.24)
                     ,legend.title="total PA"
                     ,legend.labs=c(paste0("high ",
                                           levels(OS_melanoma$totmet_hpwk_tertiles_KM)[1]), 
                                    paste0("moderate ",
                                           levels(OS_melanoma$totmet_hpwk_tertiles_KM)[2]),
                                    paste0("low ",
                                           levels(OS_melanoma$totmet_hpwk_tertiles_KM)[3]))
                     ,axes.offset=F
                     ,surv.median.line = "none"
                     ,consor=T
                     ,censor.shape=124
                     ,censor.size=2
                     ,risk.table = T
                     ,risk.table.y.text=F
                     ,fontsize=3.5
                     ,ggtheme = theme_survminer(font.x = 12, font.y=12, font.main=12, font.tickslab=10, font.legend=10)
                     ,tables.theme = theme_cleantable(base_size = 10, font.x=10, font.y=10, font.main=12)
          )
KM_OS_MET_mel
KM_OS_MET_mel$plot <- KM_OS_MET_mel$plot +
  theme(legend.background = element_rect(fill=alpha("white", alpha=0)))

7.2.2 OS~MET melanoma subgroup Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH, 
                      data = OS_melanoma)
tbl_regression(cox_tert.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.79 0.31, 2.03 0.6
    (101,371] 0.56 0.21, 1.50 0.2
B_sex
    male
    female 0.95 0.41, 2.17 0.9
B_age 0.99 0.97, 1.02 0.5
B_paladj
    palliative
    adjuvant 0.75 0.25, 2.25 0.6
B_ther_prev
    no
    yes 5.28 1.54, 18.1 0.008
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.59 0.46, 5.56 0.5
B_lab_LDH 1.00 1.00, 1.00 0.7
1 HR = Hazard Ratio, CI = Confidence Interval

7.2.3 OS~MET melanoma subgroup Cox regression - splines

Run initial regression.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH,
                     data = OS_melanoma)

tbl_regression(cox_rcs.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk 0.98 0.96, 1.00 0.10
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk' 1.02 1.00, 1.04 0.13
B_sex
    male
    female 0.91 0.40, 2.09 0.8
B_age 0.99 0.96, 1.01 0.4
B_paladj
    palliative
    adjuvant 0.70 0.24, 2.09 0.5
B_ther_prev
    no
    yes 5.03 1.48, 17.1 0.010
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.44 0.41, 5.08 0.6
B_lab_LDH 1.00 1.00, 1.00 0.7
1 HR = Hazard Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether totmet_hpwk improves the model fit at all (compare a model with splines for totmet_hpwk (Model 1) to a model without totmet_hpwk (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -106.68                     
## 2 -107.93 2.5076  2     0.2854

Than, we check whether using splines for totmet_hpwk gives a better fit than adding it as a linear variable (compare a model with splines for totmet_hpwk (Model 1) to a model with totmet_hpwk as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)+totmet_hpwk))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH + totmet_hpwk
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -106.68                     
## 2 -107.72 2.0867  1     0.1486

Thus, using restricted cubic splines for totmet_hpwk does NOT improve our model fit..

Make dummy database.

newdf <- with(OS_melanoma,
              expand.grid(totmet_hpwk=c(seq(min(totmet_hpwk),max(totmet_hpwk),length=300), k.pos.totmet_hpwk)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          , B_lab_LDH = 0
                          ))

Now we make a model matrix (= some technical stuff needed for next step).

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("totmet_hpwk",colnames(mm))]] <- 0 # set all variables not about totmet_hpwk to 0

For each value of our sequence of totmet_hpwk, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% confidence intervals (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(totmet_hpwk=newdf$totmet_hpwk,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for totmet_hpwk) and sort on totmet_hpwk.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$totmet_hpwk),] #sort based on var1 order

7.2.3.1 Make splines plot

Now, we can finally make the actual plot.

p_splines_OS_MET_mel <- ggplot(data = est
                )+ geom_ribbon(aes(x=totmet_hpwk, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                              , aes(x=totmet_hpwk, y=eta)
                )+ geom_rug(data = OS_melanoma, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("total PA (MET-hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_OS_MET_mel

7.3 OS~MVPASL melanoma subgroup

7.3.1 OS~MVPASL melanoma subgroup Kaplan Meier curves - tertiles

Kaplan Meier curve for tertiles (=unadjusted)

OS_melanoma$MZvt_HWK_tertiles_KM <- factor(OS_melanoma$MZvt_HWK_tertiles
                                          , levels = rev(levels(OS_melanoma$MZvt_HWK_tertiles)))
fit_tert.crude <- survfit(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles_KM , 
                          data = OS_melanoma)
KM_OS_MVPASL_mel <- ggsurvplot(fit_tert.crude
                     ,conf.int = F
                     ,pval = F
                     ,xscale="d_m"
                     ,break.x.by=30.4375*6
                     # ,title=""
                     ,xlab="Time (months)"
                     ,ylab="Survival probability"
                     ,legend = c(.26,.24)
                     ,legend.title="MVPA-SL"
                     ,legend.labs=c(paste0("high ",
                                           levels(OS_melanoma$MZvt_HWK_tertiles_KM)[1]), 
                                    paste0("moderate ",
                                           levels(OS_melanoma$MZvt_HWK_tertiles_KM)[2]),
                                    paste0("low ",
                                           levels(OS_melanoma$MZvt_HWK_tertiles_KM)[3]))
                     ,axes.offset=F
                     ,surv.median.line = "none"
                     ,consor=T
                     ,censor.shape=124
                     ,censor.size=2
                     ,risk.table = T
                     ,risk.table.y.text=F
                     ,fontsize=3.5
                     ,ggtheme = theme_survminer(font.x = 12, font.y=12, font.main=12, font.tickslab=10, font.legend=10)
                     ,tables.theme = theme_cleantable(base_size = 10, font.x=10, font.y=10, font.main=12)
          )
KM_OS_MVPASL_mel
KM_OS_MVPASL_mel$plot <- KM_OS_MVPASL_mel$plot +
  theme(legend.background = element_rect(fill=alpha("white", alpha=0)))

7.3.2 OS~MVPASL melanoma subgroup Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH,
                      data = OS_melanoma)
tbl_regression(cox_tert.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.48 0.18, 1.29 0.14
    (6.5,38.5] 0.50 0.18, 1.41 0.2
B_sex
    male
    female 0.76 0.31, 1.88 0.6
B_age 0.99 0.96, 1.02 0.5
B_paladj
    palliative
    adjuvant 0.68 0.23, 2.04 0.5
B_ther_prev
    no
    yes 4.86 1.36, 17.4 0.015
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.41 0.41, 4.88 0.6
B_lab_LDH 1.00 1.00, 1.00 0.7
1 HR = Hazard Ratio, CI = Confidence Interval

7.3.3 OS~MVPASL melanoma subgroup Cox regression - splines

Initial regression.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH,
                     data = OS_melanoma)
tbl_regression(cox_rcs.adj, exponentiate = T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK 0.76 0.60, 0.96 0.020
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK' 1.55 1.06, 2.26 0.023
B_sex
    male
    female 0.80 0.33, 1.97 0.6
B_age 0.99 0.96, 1.02 0.5
B_paladj
    palliative
    adjuvant 0.61 0.21, 1.75 0.4
B_ther_prev
    no
    yes 4.83 1.45, 16.1 0.010
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.29 0.38, 4.33 0.7
B_lab_LDH 1.00 1.00, 1.00 0.8
1 HR = Hazard Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether MZvt_HWK improves the model fit at all (compare a model with splines for MZvt_HWK (Model 1) to a model without MZvt_HWK (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -105.09                       
## 2 -107.93 5.6934  2    0.05804 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for MZvt_HWK gives a better fit than adding it as a linear variable (compare a model with splines for MZvt_HWK (Model 1) to a model with MZvt_HWK as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)+MZvt_HWK))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_lab_LDH + MZvt_HWK
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -105.09                       
## 2 -107.75 5.3294  1    0.02097 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Make dummy dataset.

newdf <- with(OS_melanoma,
              expand.grid(MZvt_HWK=c(seq(min(MZvt_HWK),max(MZvt_HWK),length=200), k.pos.MZvt_HWK)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          , B_lab_LDH = 0
                          ))

Now we make a model matrix (= some technical stuff needed for next step).

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("MZvt_HWK",colnames(mm))]] <- 0 # set all variables not about MZvt_HWK to 0

For each value of our sequence of MZvt_HWK, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(MZvt_HWK=newdf$MZvt_HWK,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for MZvt_HWK) and sort on MZvt_HWK.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$MZvt_HWK),] #sort based on var1 order

7.3.3.1 Make splines plot

p_splines_OS_MVPASL_mel <- ggplot(data = est
                )+ geom_ribbon(aes(x=MZvt_HWK, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                              , aes(x=MZvt_HWK, y=eta)
                )+ geom_rug(data = OS_melanoma, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .5
                )+ scale_x_continuous(expand = c(0,0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 30) #set x-axis
                            ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("MVPA-SL (hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )
p_splines_OS_MVPASL_mel

7.4 Combine plots

Make Supplementary Figure 4.

KM_OS_MET_mel <- ggarrange(KM_OS_MET_mel$plot,KM_OS_MET_mel$table
                       , ncol=1, nrow=2
                       , heights= c(.75,.25))
KM_OS_MVPASL_mel <- ggarrange(KM_OS_MVPASL_mel$plot,KM_OS_MVPASL_mel$table
                       , ncol=1, nrow=2
                       , heights= c(.75,.25))

suppl_figure5 <- ggarrange(KM_OS_MET_mel, p_splines_OS_MET_mel
                     ,KM_OS_MVPASL_mel, p_splines_OS_MVPASL_mel
                     ,labels = c("A","B","C","D")
                     ,ncol=2, nrow=2
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure5

Save as PDF.


8 OS~PA - ECOG PS 0-1 subgroup

Make OS_PS.

OS_PS <- data_OS[data_OS$B_PS%in%c("0","1"),]
table(OS_PS$B_ther_cur_regrouped.1, useNA = "always")
## 
##                 anti-PD-(L)1                         cICI 
##                          151                           53 
## ICI + chemo/targeted therapy                         <NA> 
##                           23                            0
table(OS_PS$B_tumor_type.1, useNA = "always")
## 
## melanoma    NSCLC      RCC    other     <NA> 
##      144       34       20       29        0

8.1 Baseline characteristics

Specify data frame to use

data <- OS_PS

Relabel variables and add units that you want to use in your table (all formatting to let the table look nicely):

label(data$B_sex) <- "sex"
label(data$B_age) <- "age"
units(data$B_age) <- "years"
label(data$B_PS) <- "ECOG performance status"
label(data$B_tumor_type.1) <- "tumor type"
label(data$B_tumor_stage_dich) <- "stage"
label(data$B_paladj) <- "treatment intent"
label(data$B_ther_prev) <- "previous systemic therapy"
label(data$B_ther_cur_regrouped.1) <- "therapy"
label(data$T1_gr3_1y) <- "severe irAE within 1 year"
label(data$totmet_hpwk_tertiles) <- "total PA"
label(data$totmet_hpwk) <- "total PA"
units(data$totmet_hpwk) <- "hours/week"
label(data$MZvt_HWK) <- "MVPA-SL"
units(data$MZvt_HWK) <- "hours/week"

Make the actual table:

  • ~ B_sex + B_age + ... + B_ther_cur_regrouped.1 defines the rows;
  • | totmet_hpwk_tertiles defines the columns;
  • the rest are just to change the appearance of the table.
table1(~ totmet_hpwk + MZvt_HWK + B_sex + B_age + B_PS + B_tumor_type.1 + B_tumor_stage_dich + B_paladj + B_ther_prev + B_ther_cur_regrouped.1| totmet_hpwk_tertiles,
       data = data, 
       overall = "Overall", 
       # caption = "Table 1: baseline characteristics of combined ipilimumab",
       # footnote = "RCC: renal cell carcinoma",
       topclass = "Rtable1-zebra",
       render.categorical = my.render.cat,
       render.continuous = my.render.cont,#c(.="median [Q1, Q3]"),
       render.missing = my.render.missing,
       droplevels = TRUE)
[0,51]
(N=69)
(51,101]
(N=80)
(101,371]
(N=78)
Overall
(N=227)
total PA (hours/week)
median [Q1-Q3] 35.1 [22.9-41.3] 75.4 [62.0-85.9] 131.1 [112.6-173.0] 77.8 [42.2-113.6]
mean (SD) 30.8 (12.8) 74.6 (13.7) 149.8 (52.1) 87.1 (58.5)
MVPA-SL (hours/week)
median [Q1-Q3] 1.0 [0.0-3.0] 5.0 [1.9-8.1] 8.9 [4.0-15.0] 4.0 [1.0-8.5]
mean (SD) 1.6 (2.1) 5.6 (4.6) 10.5 (8.1) 6.1 (6.6)
sex
male 41 (59.4%) 58 (72.5%) 48 (61.5%) 147 (64.8%)
female 28 (40.6%) 22 (27.5%) 30 (38.5%) 80 (35.2%)
age (years)
median [Q1-Q3] 66.0 [59.0-73.0] 65.0 [54.0-73.0] 61.0 [54.0-69.0] 64.0 [54.0-72.0]
mean (SD) 63.9 (12.8) 62.8 (13.1) 60.5 (10.8) 62.4 (12.3)
ECOG performance status
0 31 (44.9%) 39 (48.8%) 56 (71.8%) 126 (55.5%)
1 38 (55.1%) 41 (51.2%) 22 (28.2%) 101 (44.5%)
2 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
3 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
4 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
unknown 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
tumor type
melanoma 38 (55.1%) 51 (63.8%) 55 (70.5%) 144 (63.4%)
NSCLC 14 (20.3%) 8 (10.0%) 12 (15.4%) 34 (15.0%)
RCC 4 (5.8%) 9 (11.2%) 7 (9.0%) 20 (8.8%)
other 13 (18.8%) 12 (15.0%) 4 (5.1%) 29 (12.8%)
stage
III 22 (31.9%) 37 (46.2%) 39 (50.0%) 98 (43.2%)
IV 46 (66.7%) 41 (51.2%) 39 (50.0%) 126 (55.5%)
other 1 (1.4%) 2 (2.5%) 0 (0.0%) 3 (1.3%)
treatment intent
palliative 48 (69.6%) 44 (55.0%) 38 (48.7%) 130 (57.3%)
adjuvant 21 (30.4%) 36 (45.0%) 40 (51.3%) 97 (42.7%)
previous systemic therapy
no 51 (73.9%) 70 (87.5%) 66 (84.6%) 187 (82.4%)
yes 18 (26.1%) 10 (12.5%) 12 (15.4%) 40 (17.6%)
therapy
anti-PD-(L)1 44 (63.8%) 57 (71.2%) 50 (64.1%) 151 (66.5%)
cICI 16 (23.2%) 18 (22.5%) 19 (24.4%) 53 (23.3%)
ICI + chemo/targeted therapy 9 (13.0%) 5 (6.2%) 9 (11.5%) 23 (10.1%)

Indeed we selected only patients with performance status 0 or 1.

8.2 OS~MET ECOG PS 0-1 subgroup

8.2.1 OS~MET ECOG PS 0-1 subgroup Kaplan Meier curves - tertiles

We first flip the order of levels of totmet_hpwk_tertiles, since it appeart that patients with highest values have better survival and thus it is more intuitive.

OS_PS$totmet_hpwk_tertiles_KM <- factor(OS_PS$totmet_hpwk_tertiles
                                          , levels = rev(levels(OS_PS$totmet_hpwk_tertiles)))
fit_tert.crude <- survfit(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles_KM , 
                          data = OS_PS)
KM_OS_MET_PS <- ggsurvplot(fit_tert.crude
                     ,conf.int = F
                     ,pval = F
                     ,xscale="d_m"
                     ,break.x.by=30.4375*6
                     # ,title=""
                     ,xlab="Time (months)"
                     ,ylab="Survival probability"
                     ,legend = c(.26,.24)
                     ,legend.title="total PA"
                     ,legend.labs=c(paste0("high ",
                                           levels(OS_PS$totmet_hpwk_tertiles_KM)[1]), 
                                    paste0("moderate ",
                                           levels(OS_PS$totmet_hpwk_tertiles_KM)[2]),
                                    paste0("low ",
                                           levels(OS_PS$totmet_hpwk_tertiles_KM)[3]))
                     ,axes.offset=F
                     ,surv.median.line = "none"
                     ,consor=T
                     ,censor.shape=124
                     ,censor.size=2
                     ,risk.table = T
                     ,risk.table.y.text=F
                     ,fontsize=3.5
                     ,ggtheme = theme_survminer(font.x = 12, font.y=12, font.main=12, font.tickslab=10, font.legend=10)
                     ,tables.theme = theme_cleantable(base_size = 10, font.x=10, font.y=10, font.main=12)
          )
KM_OS_MET_PS
KM_OS_MET_PS$plot <- KM_OS_MET_PS$plot +
  theme(legend.background = element_rect(fill=alpha("white", alpha=0)))

8.2.2 OS~MET ECOG PS 0-1 subgroup Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, 
                      data = OS_PS)
tbl_regression(cox_tert.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.74 0.39, 1.38 0.3
    (101,371] 0.61 0.32, 1.15 0.12
B_sex
    male
    female 1.53 0.90, 2.59 0.12
B_age 0.99 0.97, 1.01 0.6
B_paladj
    palliative
    adjuvant 0.52 0.24, 1.09 0.081
B_ther_prev
    no
    yes 1.97 1.07, 3.64 0.030
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.14 0.52, 2.49 0.7
    ICI + chemo/targeted therapy 2.22 1.06, 4.64 0.034
1 HR = Hazard Ratio, CI = Confidence Interval

8.2.3 OS~MET ECOG PS 0-1 subgroup Cox regression - splines

Run initial regression.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1,
                     data = OS_PS)

tbl_regression(cox_rcs.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk 0.99 0.98, 1.00 0.067
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk' 1.01 1.00, 1.03 0.10
B_sex
    male
    female 1.51 0.89, 2.56 0.13
B_age 0.99 0.97, 1.01 0.5
B_paladj
    palliative
    adjuvant 0.51 0.24, 1.07 0.076
B_ther_prev
    no
    yes 2.01 1.08, 3.71 0.027
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.10 0.50, 2.43 0.8
    ICI + chemo/targeted therapy 2.22 1.08, 4.59 0.031
1 HR = Hazard Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether totmet_hpwk improves the model fit at all (compare a model with splines for totmet_hpwk (Model 1) to a model without totmet_hpwk (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -265.88                     
## 2 -267.50 3.2339  2     0.1985

Than, we check whether using splines for totmet_hpwk gives a better fit than adding it as a linear variable (compare a model with splines for totmet_hpwk (Model 1) to a model with totmet_hpwk as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)+totmet_hpwk))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + totmet_hpwk
##    loglik Chisq Df Pr(>|Chi|)
## 1 -265.88                    
## 2 -267.10 2.438  1     0.1184

Thus, using restricted cubic splines for totmet_hpwk does NOT improve our model fit..

Make dummy database.

newdf <- with(OS_PS,
              expand.grid(totmet_hpwk=c(seq(min(totmet_hpwk),max(totmet_hpwk),length=300), k.pos.totmet_hpwk)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          ))

Now we make a model matrix (= some technical stuff needed for next step).

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("totmet_hpwk",colnames(mm))]] <- 0 # set all variables not about totmet_hpwk to 0

For each value of our sequence of totmet_hpwk, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% confidence intervals (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(totmet_hpwk=newdf$totmet_hpwk,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for totmet_hpwk) and sort on totmet_hpwk.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$totmet_hpwk),] #sort based on var1 order

8.2.3.1 Make splines plot

Now, we can finally make the actual plot.

p_splines_OS_MET_PS <- ggplot(data = est
                )+ geom_ribbon(aes(x=totmet_hpwk, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                              , aes(x=totmet_hpwk, y=eta)
                )+ geom_rug(data = OS_PS, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("total PA (MET-hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_OS_MET_PS

8.3 OS~MVPASL ECOG PS 0-1 subgroup

8.3.1 OS~MVPASL ECOG PS 0-1 subgroup Kaplan Meier curves - tertiles

Kaplan Meier curve for tertiles (=unadjusted)

OS_PS$MZvt_HWK_tertiles_KM <- factor(OS_PS$MZvt_HWK_tertiles
                                          , levels = rev(levels(OS_PS$MZvt_HWK_tertiles)))
fit_tert.crude <- survfit(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles_KM , 
                          data = OS_PS)
KM_OS_MVPASL_PS <- ggsurvplot(fit_tert.crude
                     ,conf.int = F
                     ,pval = F
                     ,xscale="d_m"
                     ,break.x.by=30.4375*6
                     # ,title=""
                     ,xlab="Time (months)"
                     ,ylab="Survival probability"
                     ,legend = c(.26,.24)
                     ,legend.title="MVPA-SL"
                     ,legend.labs=c(paste0("high ",
                                           levels(data_OS$MZvt_HWK_tertiles_KM)[1]), 
                                    paste0("moderate ",
                                           levels(data_OS$MZvt_HWK_tertiles_KM)[2]),
                                    paste0("low ",
                                           levels(data_OS$MZvt_HWK_tertiles_KM)[3]))
                     ,axes.offset=F
                     ,surv.median.line = "none"
                     ,consor=T
                     ,censor.shape=124
                     ,censor.size=2
                     ,risk.table = T
                     ,risk.table.y.text=F
                     ,fontsize=3.5
                     ,ggtheme = theme_survminer(font.x = 12, font.y=12, font.main=12, font.tickslab=10, font.legend=10)
                     ,tables.theme = theme_cleantable(base_size = 10, font.x=10, font.y=10, font.main=12)
          )
KM_OS_MVPASL_PS
KM_OS_MVPASL_PS$plot <- KM_OS_MVPASL_PS$plot +
  theme(legend.background = element_rect(fill=alpha("white", alpha=0)))

8.3.2 OS~MVPASL ECOG PS 0-1 subgroup Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1,
                      data = OS_PS)
tbl_regression(cox_tert.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.66 0.36, 1.20 0.2
    (6.5,38.5] 0.64 0.32, 1.30 0.2
B_sex
    male
    female 1.45 0.85, 2.47 0.2
B_age 1.00 0.97, 1.02 0.7
B_paladj
    palliative
    adjuvant 0.53 0.25, 1.12 0.095
B_ther_prev
    no
    yes 2.02 1.09, 3.75 0.026
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.11 0.51, 2.40 0.8
    ICI + chemo/targeted therapy 2.26 1.09, 4.66 0.028
1 HR = Hazard Ratio, CI = Confidence Interval

8.3.3 OS~MVPASL ECOG PS 0-1 subgroup Cox regression - splines

Initial regression.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1,
                     data = OS_PS)
tbl_regression(cox_rcs.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK 0.85 0.73, 0.98 0.026
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK' 1.31 1.02, 1.68 0.033
B_sex
    male
    female 1.43 0.84, 2.46 0.2
B_age 1.00 0.97, 1.02 0.7
B_paladj
    palliative
    adjuvant 0.51 0.24, 1.07 0.076
B_ther_prev
    no
    yes 2.09 1.12, 3.88 0.020
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 1.06 0.49, 2.29 0.9
    ICI + chemo/targeted therapy 2.14 1.04, 4.41 0.039
1 HR = Hazard Ratio, CI = Confidence Interval

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether MZvt_HWK improves the model fit at all (compare a model with splines for MZvt_HWK (Model 1) to a model without MZvt_HWK (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##   loglik  Chisq Df Pr(>|Chi|)  
## 1 -265.0                       
## 2 -267.5 5.0038  2    0.08193 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for MZvt_HWK gives a better fit than adding it as a linear variable (compare a model with splines for MZvt_HWK (Model 1) to a model with MZvt_HWK as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)+MZvt_HWK))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + MZvt_HWK
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -265.00                       
## 2 -267.19 4.3749  1    0.03647 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Make dummy dataset.

newdf <- with(OS_PS,
              expand.grid(MZvt_HWK=c(seq(min(MZvt_HWK),max(MZvt_HWK),length=200), k.pos.MZvt_HWK)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          ))

Now we make a model matrix (= some technical stuff needed for next step).

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("MZvt_HWK",colnames(mm))]] <- 0 # set all variables not about MZvt_HWK to 0

For each value of our sequence of MZvt_HWK, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(MZvt_HWK=newdf$MZvt_HWK,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for MZvt_HWK) and sort on MZvt_HWK.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$MZvt_HWK),] #sort based on var1 order
8.3.3.0.1 Make splines plot
p_splines_OS_MVPASL_PS <- ggplot(data = est
                )+ geom_ribbon(aes(x=MZvt_HWK, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                              , aes(x=MZvt_HWK, y=eta)
                )+ geom_rug(data = OS_PS, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .5
                )+ scale_x_continuous(expand = c(0,0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 30) #set x-axis
                            ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("MVPA-SL (hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )
p_splines_OS_MVPASL_PS

8.4 Combine plots

Make Supplementary Figure 6.

KM_OS_MET_PS <- ggarrange(KM_OS_MET_PS$plot,KM_OS_MET_PS$table
                       , ncol=1, nrow=2
                       , heights= c(.75,.25))
KM_OS_MVPASL_PS <- ggarrange(KM_OS_MVPASL_PS$plot,KM_OS_MVPASL_PS$table
                       , ncol=1, nrow=2
                       , heights= c(.75,.25))

suppl_figure6 <- ggarrange(KM_OS_MET_PS, p_splines_OS_MET_PS
                     ,KM_OS_MVPASL_PS, p_splines_OS_MVPASL_PS
                     ,labels = c("A","B","C","D")
                     ,ncol=2, nrow=2
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure6

Save as PDF.


9 OS~PA - ECOG adjusted

Here, we will rerun the same analyses as primary analyses, but now with additional adjustment for ECOG performance status. ECOG performance status was missing for 6 patients (2.4%). Electronic patient files were checked, data seemed missing completely at random (MCAR). Given the low number of missing values, missingness under MCAR assumption and the analysis being a sensitivity analysis, a complete case analysis was considered appropriate.

9.1 MET-hours and overall survival - ECOG adjusted

9.1.1 OS~MET-hours ECOG adjusted Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable.

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, data = data_OS)
cox_tert.adj_PS <- coxph(Surv(OS_days, OS_status)~ totmet_hpwk_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, data = data_OS)
tbl_merge(list(
  tbl_regression(cox_tert.adj, exponentiate=T),
  tbl_regression(cox_tert.adj_PS, exponentiate=T)
  ),
  tab_spanner = c("without ECOG PS", "with ECOG PS")
)
Characteristic without ECOG PS with ECOG PS
HR1 95% CI1 p-value HR1 95% CI1 p-value
totmet_hpwk_tertiles
    [0,51]
    (51,101] 0.58 0.32, 1.04 0.065 0.69 0.37, 1.29 0.2
    (101,371] 0.48 0.27, 0.89 0.019 0.60 0.32, 1.11 0.10
B_sex
    male
    female 1.31 0.78, 2.18 0.3 1.35 0.80, 2.27 0.3
B_age 0.99 0.97, 1.01 0.4 0.98 0.96, 1.00 0.082
B_tumor_type.1
    melanoma
    NSCLC 1.14 0.49, 2.64 0.8 1.45 0.65, 3.27 0.4
    RCC 3.53 1.33, 9.33 0.011 3.52 1.42, 8.70 0.006
    other 1.16 0.52, 2.61 0.7 1.22 0.54, 2.74 0.6
B_paladj
    palliative
    adjuvant 0.43 0.20, 0.92 0.030 0.62 0.28, 1.36 0.2
B_ther_prev
    no
    yes 1.87 1.03, 3.39 0.040 1.81 0.99, 3.29 0.053
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 0.57 0.22, 1.50 0.3 0.78 0.32, 1.92 0.6
    ICI + chemo/targeted therapy 1.86 0.88, 3.94 0.11 2.22 1.07, 4.62 0.032
B_PS_tri
    0
    1 1.58 0.87, 2.88 0.13
    2-3 6.21 2.69, 14.3 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

9.1.2 OS~MET-hours ECOG adjusted Cox regression - continuous linear

Adjusted Cox regression for continuous linear variable.

cox_lin.adj <- coxph(Surv(OS_days, OS_status)~ totmet_hpwk + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, data = data_OS)
# tbl_regression(cox_lin.adj, exponentiate=T)

9.1.3 OS~MET-hours ECOG adjusted Cox regression - assumptions

Shoenfeld residuals plot: it should be possible to draw a horizontal line between the confidence bands. Otherwise, the Cox proportional hazard assumption is violated, meaning that the baseline hazard is not constant over time.

plot(cox.zph(cox_lin.adj, transform = "identity")[1])
abline(h=cox_lin.adj$coefficients[1], col="blue")

Martingale residuals can help decide upon the functional form of a dependent variable: This should give a more or less straight line. If not, consider a transformation of the dependent variable.

fit0 <- update(cox_lin.adj,~.- totmet_hpwk)
resid_X <- resid(fit0, type = "martingale")
if(is.null(fit0$na.action)){
  df.tmp <- data.frame(X=data_OS$totmet_hpwk, resid_X=resid_X)
} else {
  df.tmp <- data.frame(X=data_OS[-c(fit0$na.action),]$totmet_hpwk, resid_X=resid_X)
}
ggplot(data = df.tmp
       )+ geom_point(aes(x=X, y=resid_X), color="black", alpha=0.1
       )+ geom_smooth(aes(x=X, y=resid_X), method = loess
       )+ labs(x="totmet_hpwk", y="Martingale residuals")
## `geom_smooth()` using formula = 'y ~ x'

fit0 <- update(cox_lin.adj,~.- B_age)
resid_X <- resid(fit0, type = "martingale")
if(is.null(fit0$na.action)){
  df.tmp <- data.frame(X=data_OS$B_age, resid_X=resid_X)
} else {
  df.tmp <- data.frame(X=data_OS[-c(fit0$na.action),]$B_age, resid_X=resid_X)
}
ggplot(data = df.tmp
       )+ geom_point(aes(x=X, y=resid_X), color="black", alpha=0.1
       )+ geom_smooth(aes(x=X, y=resid_X), method = loess
       )+ labs(x="B_age", y="Martingale residuals")
## `geom_smooth()` using formula = 'y ~ x'

rm(fit0, resid_X, df.tmp)

9.1.4 OS~MET-hours ECOG adjusted Cox regression - splines

9.1.4.1 Initial regression

Conduct survival analysis using splines regression by the prespecified knots. We make use of the rcs() function in package rms which provides restricted cubic splines.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, data = data_OS)
tbl_regression(cox_rcs.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
totmet_hpwk
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk 0.99 0.98, 1.00 0.11
    rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)totmet_hpwk' 1.01 1.00, 1.02 0.14
B_sex
    male
    female 1.34 0.80, 2.24 0.3
B_age 0.98 0.96, 1.00 0.10
B_tumor_type.1
    melanoma
    NSCLC 1.41 0.63, 3.16 0.4
    RCC 3.32 1.34, 8.23 0.010
    other 1.18 0.52, 2.67 0.7
B_paladj
    palliative
    adjuvant 0.61 0.28, 1.33 0.2
B_ther_prev
    no
    yes 1.89 1.04, 3.44 0.038
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 0.77 0.31, 1.90 0.6
    ICI + chemo/targeted therapy 2.26 1.10, 4.65 0.027
B_PS_tri
    0
    1 1.62 0.90, 2.92 0.11
    2-3 5.79 2.42, 13.9 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval
9.1.4.1.1 Check added value of using splines

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether totmet_hpwk improves the model fit at all (compare a model with splines for totmet_hpwk (Model 1) to a model without totmet_hpwk (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri
##  Model 2: ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -316.70                     
## 2 -317.97 2.5534  2     0.2789

Then, we check whether using splines for totmet_hpwk gives a better fit than adding it as a linear variable (compare a model with splines for totmet_hpwk (Model 1) to a model with totmet_hpwk as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~totmet_hpwk+.-rms::rcs(totmet_hpwk, k.pos.totmet_hpwk)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(totmet_hpwk, k.pos.totmet_hpwk) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri
##  Model 2: ~ totmet_hpwk + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -316.70                     
## 2 -317.67 1.9459  1      0.163

9.1.4.2 Visualise the regression coefficient

We want to visualise the hazard rate (HR) for death (in case of overall survival (OS)) over the trajectory of our variable of interest (totmet_hpwk). To do this, we will mak a dummy dataset to predict the HR for certain values of totmet_hpwk. We will also obtain the 95% confidence interval (CI).

9.1.4.2.1 Make dummy dataset to obtain predicted HR
newdf <- with(data_OS,
              expand.grid(totmet_hpwk=c(seq(min(totmet_hpwk),max(totmet_hpwk),length=300), k.pos.totmet_hpwk)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          , B_PS_tri = levels(B_PS_tri)[1]
                          ))

Now we make a model matrix, which is needed in the subsequent steps.

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("totmet_hpwk",colnames(mm))]] <- 0 # set all variables not about totmet_hpwk to 0

For each value of our sequence of totmet_hpwk, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(totmet_hpwk=newdf$totmet_hpwk,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for totmet_hpwk) and sort on totmet_hpwk.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$totmet_hpwk),] #sort based on var1 order
9.1.4.2.2 Make splines plot
p_splines_OS_MET <- ggplot(data = est
                )+ geom_ribbon(aes(x=totmet_hpwk, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=totmet_hpwk,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth= 1
                )+ geom_point(data=est[est$totmet_hpwk%in%k.pos.totmet_hpwk,] #to plot the places of the knots
                              , aes(x=totmet_hpwk, y=eta)
                )+ geom_rug(data = data_OS, aes(x=totmet_hpwk) #add rugs representing real observations for totmet_hpwk
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .3
                )+ scale_x_continuous(expand = c(0, 0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 201) #set x-axis
                                   ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("total PA (MET-hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_OS_MET

9.2 MVPA-SL and overall survival - ECOG adjusted

9.2.1 OS~MVPA-SL ECOG adjusted Cox regression - tertiles

Adjusted Cox regression for tertiles linear variable.

cox_tert.adj <- coxph(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, data = data_OS)
cox_tert.adj_PS <- coxph(Surv(OS_days, OS_status)~ MZvt_HWK_tertiles + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + B_PS_tri, data = data_OS)
tbl_merge(list(
  tbl_regression(cox_tert.adj, exponentiate=T),
  tbl_regression(cox_tert.adj_PS, exponentiate=T)
  ),
  tab_spanner = c("without ECOG PS", "with ECOG PS")
)
Characteristic without ECOG PS with ECOG PS
HR1 95% CI1 p-value HR1 95% CI1 p-value
MZvt_HWK_tertiles
    [0,1.5]
    (1.5,6.5] 0.70 0.41, 1.21 0.2 0.80 0.46, 1.40 0.4
    (6.5,38.5] 0.54 0.28, 1.04 0.066 0.66 0.33, 1.29 0.2
B_sex
    male
    female 1.18 0.70, 1.97 0.5 1.26 0.74, 2.13 0.4
B_age 1.00 0.98, 1.02 0.8 0.99 0.97, 1.01 0.2
B_tumor_type.1
    melanoma
    NSCLC 1.06 0.44, 2.54 0.9 1.40 0.62, 3.21 0.4
    RCC 3.29 1.24, 8.74 0.017 3.36 1.36, 8.33 0.009
    other 1.25 0.56, 2.80 0.6 1.29 0.57, 2.89 0.5
B_paladj
    palliative
    adjuvant 0.46 0.21, 1.00 0.050 0.68 0.31, 1.50 0.3
B_ther_prev
    no
    yes 2.01 1.10, 3.67 0.023 1.92 1.06, 3.51 0.033
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 0.60 0.23, 1.57 0.3 0.81 0.33, 2.00 0.6
    ICI + chemo/targeted therapy 1.99 0.93, 4.25 0.074 2.38 1.15, 4.92 0.019
B_PS_tri
    0
    1 1.66 0.92, 2.99 0.091
    2-3 6.75 2.95, 15.4 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

9.2.2 OS~MVPA-SL ECOG adjusted Cox regression - continuous linear

Adjusted Cox regression for continuous linear variable.

cox_lin.adj <- coxph(Surv(OS_days, OS_status)~ MZvt_HWK + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, data = data_OS)
# tbl_regression(cox_lin.adj, exponentiate=T)

9.2.3 OS~MVPA-SL ECOG adjusted Cox regression - assumptions

Shoenfeld residuals plot: it should be possible to draw a horizontal line between the confidence bands. Otherwise, the Cox proportional hazard assumption is violated, meaning that the baseline hazard is not constant over time.

plot(cox.zph(cox_lin.adj, transform = "identity")[1])
abline(h=cox_lin.adj$coefficients[1], col="blue")

Martingale residuals can help decide upon the functional form of a dependent variable: This should give a more or less straight line. If not, consider a transformation of the dependent variable.

fit0 <- update(cox_lin.adj,~.- MZvt_HWK)
resid_X <- resid(fit0, type = "martingale")
if(is.null(fit0$na.action)){
  df.tmp <- data.frame(X=data_OS$MZvt_HWK, resid_X=resid_X)
} else {
  df.tmp <- data.frame(X=data_OS[-c(fit0$na.action),]$MZvt_HWK, resid_X=resid_X)
}
ggplot(data = df.tmp
       )+ geom_point(aes(x=X, y=resid_X), color="black", alpha=0.1
       )+ geom_smooth(aes(x=X, y=resid_X), method = loess
       )+ labs(x="MZvt_HWK", y="Martingale residuals")
## `geom_smooth()` using formula = 'y ~ x'

fit0 <- update(cox_lin.adj,~.- B_age)
resid_X <- resid(fit0, type = "martingale")
if(is.null(fit0$na.action)){
  df.tmp <- data.frame(X=data_OS$B_age, resid_X=resid_X)
} else {
  df.tmp <- data.frame(X=data_OS[-c(fit0$na.action),]$B_age, resid_X=resid_X)
}
ggplot(data = df.tmp
       )+ geom_point(aes(x=X, y=resid_X), color="black", alpha=0.1
       )+ geom_smooth(aes(x=X, y=resid_X), method = loess
       )+ labs(x="B_age", y="Martingale residuals")
## `geom_smooth()` using formula = 'y ~ x'

rm(fit0, resid_X, df.tmp)

9.2.4 OS~MVPA-SL ECOG adjusted Cox regression - splines

9.2.4.1 Initial regression

Conduct survival analysis using splines regression by the prespecified knots. We make use of the rcs() function in package rms which provides restricted cubic splines.

cox_rcs.adj <- coxph(Surv(OS_days, OS_status)~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1, data = data_OS)
tbl_regression(cox_rcs.adj, exponentiate=T)
Characteristic HR1 95% CI1 p-value
MZvt_HWK
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK 0.85 0.74, 0.96 0.012
    rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)MZvt_HWK' 1.30 1.03, 1.63 0.025
B_sex
    male
    female 1.17 0.70, 1.97 0.6
B_age 1.00 0.98, 1.02 0.7
B_tumor_type.1
    melanoma
    NSCLC 1.05 0.44, 2.53 >0.9
    RCC 3.24 1.22, 8.60 0.018
    other 1.20 0.53, 2.70 0.7
B_paladj
    palliative
    adjuvant 0.44 0.20, 0.95 0.037
B_ther_prev
    no
    yes 2.06 1.12, 3.77 0.019
B_ther_cur_regrouped.1
    anti-PD-(L)1
    cICI 0.57 0.22, 1.49 0.3
    ICI + chemo/targeted therapy 1.93 0.91, 4.09 0.088
1 HR = Hazard Ratio, CI = Confidence Interval
9.2.4.1.1 Check added value of using splines

We can check whether using restricted cubic splines improves the model fit (statistically). A p-value <0.05 indicates that the model fit is statistically different.

We can check whether MZvt_HWK improves the model fit at all (compare a model with splines for MZvt_HWK (Model 1) to a model without MZvt_HWK (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##    loglik Chisq Df Pr(>|Chi|)  
## 1 -327.03                      
## 2 -330.37 6.664  2    0.03572 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Than, we check whether using splines for MZvt_HWK gives a better fit than adding it as a linear variable (compare a model with splines for MZvt_HWK (Model 1) to a model with MZvt_HWK as is (Model 2)).

anova(cox_rcs.adj,update(cox_rcs.adj,.~.-rms::rcs(MZvt_HWK, k.pos.MZvt_HWK)+MZvt_HWK))
## Analysis of Deviance Table
##  Cox model: response is  Surv(OS_days, OS_status)
##  Model 1: ~ rms::rcs(MZvt_HWK, k.pos.MZvt_HWK) + B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1
##  Model 2: ~ B_sex + B_age + B_tumor_type.1 + B_paladj + B_ther_prev + B_ther_cur_regrouped.1 + MZvt_HWK
##    loglik  Chisq Df Pr(>|Chi|)  
## 1 -327.03                       
## 2 -329.43 4.7839  1    0.02873 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

9.2.4.2 Visualise the regression coefficient

We want to visualise the hazard rate (HR) for death (in case of overall survival (OS)) over the trajectory of our variable of interest (MZvt_HWK). To do this, we will mak a dummy dataset to predict the HR for certain values of MZvt_HWK. We will also obtain the 95% confidence interval (CI).

9.2.4.2.1 Make dummy dataset to obtain predicted HR
newdf <- with(data_OS,
              expand.grid(MZvt_HWK=c(seq(min(MZvt_HWK),max(MZvt_HWK),length=300), k.pos.MZvt_HWK)
                          , B_sex = levels(B_sex)[1] #reference level for sex
                          , B_age = 0 #reference age for age (note, you could also use e.g. median, this wouldn't change the results)
                          , B_tumor_type.1 = levels(B_tumor_type.1)[1]
                          , B_paladj = levels(B_paladj)[1]
                          , B_ther_prev = levels(B_ther_prev)[1]
                          , B_ther_cur_regrouped.1 = levels(B_ther_cur_regrouped.1)[1]
                          , B_PS_tri = levels(B_PS_tri)[1]
                          ))

Now we make a model matrix, which is needed in the subsequent steps.

mm <- model.matrix(cox_rcs.adj,data=newdf)
mm[,!colnames(mm) %in% colnames(mm)[grep("MZvt_HWK",colnames(mm))]] <- 0 # set all variables not about MZvt_HWK to 0

For each value of our sequence of MZvt_HWK, we obtain its predicted coefficient (=eta) and predicted standard error (=pse). With these, we compute the predicted 95% conficence intervalls (plo and phi for lower and upper CI). We store these in a data frame called est.

est <- data.frame(MZvt_HWK=newdf$MZvt_HWK,eta=mm%*%coef(cox_rcs.adj)) #predicted coefficient
pvar <- diag(mm %*% tcrossprod(vcov(cox_rcs.adj),mm)) #predicted variance
est <- data.frame(est,pse=sqrt(pvar)) #predicted standard error
est <- with(est,
            data.frame(est,
                       plo=eta-qnorm(0.975)*pse, #predicted lower 95%CI bound
                       phi=eta+qnorm(0.975)*pse)) #predicted upper 95%CI bound

For Cox regression and logistic regression we need to exponentiate the coefficient and confidence bounds to obtain HR and 95% CI.

est[,c("eta","plo","phi")] <- exp(est[,c("eta","plo","phi")])

We remove duplicates (e.g. if we have two of the same values for MZvt_HWK) and sort on MZvt_HWK.

est <- est[!duplicated(est),] #remove duplicates 
est <- est[order(est$MZvt_HWK),] #sort based on var1 order
9.2.4.2.2 Make splines plot
p_splines_OS_MVPASL <- ggplot(data = est
                )+ geom_ribbon(aes(x=MZvt_HWK, ymin = plo, ymax = phi) #color 95%CI
                               , colour = rgb(248,136,44,maxColorValue = 255,alpha=100)
                               , fill=rgb(252,211,178,maxColorValue = 255,alpha=100)
                               , alpha=0.4
                )+ geom_hline(yintercept=1 #add horizontal line at 1
                              , color="dark grey"
                              , linetype="dashed"
                              , linewidth=1
                )+ geom_line(aes(x=MZvt_HWK,y=eta) #add estimated HR
                             , color = "coral3"
                             , linewidth = 1
                )+ geom_point(data=est[est$MZvt_HWK%in%k.pos.MZvt_HWK,] #to plot the places of the knots
                              , aes(x=MZvt_HWK, y=eta)
                )+ geom_rug(data = data_OS, aes(x=MZvt_HWK) #add rugs representing real observations for MZvt_HWK
                            , sides = "b"
                            , length = grid::unit(0.02, "npc")
                            , color = "grey"
                            , alpha = .5
                )+ scale_x_continuous(expand = c(0,0) #set axes directly at 0
                )+ scale_y_log10(breaks = unique(c(0.1,seq(0,1,.2), seq(0,1,.2)*10))
                # )+ annotation_logticks(base = 10, sides = "l", outside=F , scaled = T, color = "black",alpha = .5
                )+ coord_cartesian(#xlim = c(0, 30) #set x-axis
                            ylim = c(0.1,10) #set y-axis
                # )+ ggtitle("" #add title
                )+ xlab("MVPA-SL (hours/week)" #add x-axis label
                )+ ylab("Hazard ratio for death" #add y-axis label
                )+ theme_light( #add/alter theme
                )+ theme(plot.margin = margin(0.5,3,0.5,0.5, unit = "char")
                )

p_splines_OS_MVPASL

9.3 Combine plots

Make Supplementary Figure 7.

suppl_figure7 <- ggarrange(p_splines_OS_MET , p_splines_OS_MVPASL
                     ,labels = c("A","B")
                     ,ncol=2, nrow=1
                     ,widths = c(1,1), heights = c(1,1)
                     ,common.legend=F
                     )
suppl_figure7

Save as PDF.


10 Session Info

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.5.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Amsterdam
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggsurvfit_0.3.0  survminer_0.4.9  ggpubr_0.6.0     tidycmprsk_0.2.0
##  [5] cmprsk_2.2-11    survival_3.5-5   rms_6.7-0        Hmisc_5.1-0     
##  [9] ggplot2_3.4.2    gtsummary_1.7.1  table1_1.4.3     tidyr_1.3.0     
## [13] dplyr_1.1.2     
## 
## loaded via a namespace (and not attached):
##  [1] gridExtra_2.3        sandwich_3.0-2       rlang_1.1.1         
##  [4] magrittr_2.0.3       multcomp_1.4-25      polspline_1.1.22    
##  [7] compiler_4.3.1       mgcv_1.8-42          systemfonts_1.0.4   
## [10] vctrs_0.6.3          quantreg_5.95        stringr_1.5.0       
## [13] pkgconfig_2.0.3      fastmap_1.1.1        backports_1.4.1     
## [16] ellipsis_0.3.2       labeling_0.4.2       KMsurv_0.1-5        
## [19] utf8_1.2.3           rmarkdown_2.22       markdown_1.7        
## [22] haven_2.5.2          ragg_1.2.5           MatrixModels_0.5-1  
## [25] purrr_1.0.1          xfun_0.39            cachem_1.0.8        
## [28] labelled_2.11.0      jsonlite_1.8.5       highr_0.10          
## [31] broom_1.0.5          cluster_2.1.4        R6_2.5.1            
## [34] bslib_0.5.0          stringi_1.7.12       car_3.1-2           
## [37] rpart_4.1.19         jquerylib_0.1.4      Rcpp_1.0.10         
## [40] knitr_1.43           zoo_1.8-12           base64enc_0.1-3     
## [43] Matrix_1.5-4.1       splines_4.3.1        nnet_7.3-19         
## [46] tidyselect_1.2.0     rstudioapi_0.14      abind_1.4-5         
## [49] yaml_2.3.7           ggtext_0.1.2         codetools_0.2-19    
## [52] lattice_0.21-8       tibble_3.2.1         withr_2.5.0         
## [55] evaluate_0.21        foreign_0.8-84       xml2_1.3.4          
## [58] survMisc_0.5.6       pillar_1.9.0         carData_3.0-5       
## [61] checkmate_2.2.0      generics_0.1.3       hms_1.1.3           
## [64] munsell_0.5.0        commonmark_1.9.0     scales_1.2.1        
## [67] xtable_1.8-4         glue_1.6.2           tools_4.3.1         
## [70] data.table_1.14.8    SparseM_1.81         ggsignif_0.6.4      
## [73] forcats_1.0.0        mvtnorm_1.2-2        cowplot_1.1.1       
## [76] grid_4.3.1           colorspace_2.1-0     patchwork_1.1.2     
## [79] nlme_3.1-162         htmlTable_2.4.1      Formula_1.2-5       
## [82] cli_3.6.1            km.ci_0.5-6          textshaping_0.3.6   
## [85] fansi_1.0.4          broom.helpers_1.13.0 gt_0.9.0            
## [88] gtable_0.3.3         rstatix_0.7.2        sass_0.4.6          
## [91] digest_0.6.31        TH.data_1.1-2        htmlwidgets_1.6.2   
## [94] farver_2.1.1         htmltools_0.5.5      lifecycle_1.0.3     
## [97] hardhat_1.3.0        gridtext_0.1.5       MASS_7.3-60